bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
35 stars 47 forks source link

Accesssing metadata in v2 #617

Open danielballan opened 4 years ago

danielballan commented 4 years ago

While we are fixing the experimental v2 API with backward-incompatible changes (https://github.com/bluesky/databroker/pull/616) I would like to open up another Pandora's jar that has been left closed for awhile. In a context of analysis, as opposed to acq, it is confusing that metadata is scoped under 'start' and 'stop', as in run.metadata['start']['sample'] as opposed to run.metadata['sample'].

Nonetheless, I find run.metadata['start']['sample'] hard to justify in contexts where streaming or acquisition are not salient and the metadata would more naturally appear as one unit. Let's brainstorm alternatives:

  1. Move the contents of 'stop' to a special accessor.
run.metadata  # 'start' doc
run.stop_doc

We might give that accessor a more self-describing name than "stop_doc" like "exit_info".

  1. Nest the 'stop' key inside the contents of the 'start' document, and ban 'stop' as a user-provided key.
run.metadata  # 'start' doc
run.metadata['stop']  # 'stop' doc

Other ideas?

dylanmcreynolds commented 4 years ago

I like this, and vote for option 1.

tacaswell commented 4 years ago

I'm pretty neutral on this (i'm not super fussed about the md['start']['blag']), but also do not think it is worth fighting over keeping it (particularly now that the descriptors have been scattered elsewhere in the data structure if #616 goes in).

If we do this 1 is much better than 2 due to "stop" being both a camera term for the aperture (well f-stop but), a reasonable shortening of "beam stop", and a common enough term in hardware I could imagine users having some top-level metadata called "stop".

danielballan commented 4 years ago

If we are calling the contents of the 'start' document metadata I feel we are under no obligation to call the contents of the 'stop' document stop_doc. Proposals for what to call it? As a reminder, its contents are:

{'exit_status': 'success',
 'num_events': {'primary': 3},
 'reason': '',
 'run_start': 'a5e1e2e8-6350-4b08-88b4-c4c1e5cb0195',
 'time': 1603409103.4623098,
 'uid': 'c0683d09-71fc-4a03-8939-2882946c8b6d'}

In an earlier comment, I mentioned exit_info as one more self-describing possibility. If one is not read into the Document Model, stop reads like a verb (as in device.stop()) rather than "Oh this is probably metadata about completion time and disposition along with a count of the Events in each stream!"

ronpandolfi commented 4 years ago

I agree that both are improvements. I'd consider if run.metadata could be made more specific; there's plenty of metadata that's won't be included there. Would run.start_doc and run.stop_doc be comfortably specific and more balanced? Or, if part of the intent here is for the API to insulate from the concepts of 'documents', how is run.start_metadata and run.stop_metadata.

danielballan commented 4 years ago

Just noting this so we don't forget: @dylanmcreynolds suggested run.summary for the stop doc, since it summarizes number of events and so on.