bluesky / event-model

data model for event-based data collection and analysis
https://blueskyproject.io/event-model
BSD 3-Clause "New" or "Revised" License
13 stars 30 forks source link

Add a new document type for "bulk" external resources with no predetermined shape. #222

Closed maffettone closed 1 year ago

maffettone commented 2 years ago

Develop a stream_resource that manages an unknown number of contiguous stream_datum, with the potential for multiple streams. This is especially relevant when the data is expected to be ragged or has no pre-determined shape (number of rows).

The potential for multiple streams makes this construction slightly different from resource in that multiple counters, and stream names need to be tracked within a closure.

Description

TODO:

Motivation and Context

Currently the implementation of external data is limited to filling in data on a per-event per-field basis, basically the smallest quanta of data that the system can express. While this works extremely well for something like image stacks of equal length, this works very poorly in cases where the data is natively ragged (single-photon detectors) or when there are many many relatively small data sets (e.g. scanned florescence data). The difficulties vary from fully breaking the ability to put the data in an xarray to pathological performance problems.

Closes #219.

How Has This Been Tested?

Testing environment included all event_model and bluesky requirements in Python 3.9.

All previous tests pass.

Tests added:

maffettone commented 2 years ago

Before going much deeper and embedding these new documents into the DocumentRouter, and Filler classes, I wanted to get a partial review (@tacaswell ) and clarify that my approach here was sensible.

A given stream_resource can point to multiple stream_datum with unique names. I.e. a single resource could point to the output stream of a single photon detector and the output stream of some thermocouple. Each datum can contain one or many rows (frames of the detector or voltage/temperature pairs of the thermocouple), but a single block_id, for the sake of constructing the full column in retrospect. The event_offset is there for the sake of book-keeping in that block 0 and block 1 may actually be separated by a gap?

coretl commented 1 year ago

We're happy with this so happy to see it merged whenever

danielballan commented 1 year ago

@tacaswell Is this ready to go? I will run the release process when this is merged.

maffettone commented 1 year ago

Unsure why I left Documentation under the TODO's list. Perhaps just because it was failing the documentation Action.

tacaswell commented 1 year ago

@coretl Lets see what breaks when you use this in anger 😈 .