bluesky / event-model

data model for event-based data collection and analysis
https://blueskyproject.io/event-model
BSD 3-Clause "New" or "Revised" License
13 stars 30 forks source link

Add typing support to event_model #235

Closed coretl closed 1 year ago

coretl commented 1 year ago

@evalott100

evalott100 commented 1 year ago

Remember to add mypy typechecking support to the repo

evalott100 commented 1 year ago

Mypy seems to succeed locally. In github CI it is failing because of errors in _version.py. I've been trying my best to exclude those errors/that file with no progress. __init__.py is passing in mypy otherwise though.

coretl commented 1 year ago

@callumforrester

callumforrester commented 1 year ago

Fix in off-branch: https://github.com/bluesky/event-model/tree/235_add_typing_support_to_event_model_fix_cf Can merge into this one to apply

callumforrester commented 1 year ago

@coretl @danielballan @evalott100 Unsure if this is the best place to raise this, but should we be considering a top level document schema? Previously documents were passed around according to an agreed Python function signature.

def handle_doc(name, doc):

But if we're passing them around to external applications now, maybe we should have a top-level schema that looks a bit like

{
    "name": "start"
    "doc": {"foo": "bar"}
}

I bring it up because we've already managed to diverge internally, and I notice we're also different to the queueserver which publishes this to kafka

["start", {"foo": "bar"}]
danielballan commented 1 year ago

I think it is important to converge on a standard.

And I agree that an object is much better than a dict. The [name, doc] pattern originated in ~2017 (so @dmgav is blameless here, haha), just a naive extension of the function signature into JSON. At the time I did not understand that this was not idiomatic in JSON. I think it's worth making a change consistently across bluesky projects to move to the object representation.

I think most consumers can be made to deal with both representations, to enable a smooth transition for users.

danielballan commented 1 year ago

In the databroker 2.0 prerelease we have a GET /documents route that returns [name, doc]. In https://github.com/bluesky/databroker/pull/752 while adding POST /documents I changed the API to {"name": "start", "doc": {"time": 0, "uid": "xxx"}} as proposed above.

Since databroker 2.0 is still prerelease, we have room to change our minds if we want to. But I thought I'd note here that I'm moving to converge on @callumforrester's proposal above.

coretl commented 1 year ago

Also, should add the following to Descriptor DataKeys:

https://github.com/bluesky/bluesky/blob/37eb3e21cf136db7ddf1f6a063f373f06c229578/bluesky/protocols.py#L48-L51

Also choices: List[str] if the native type is enum, then serialize as a string which will be one of these choices

And add to Reading:

https://github.com/bluesky/bluesky/blob/37eb3e21cf136db7ddf1f6a063f373f06c229578/bluesky/protocols.py#L15-L24