bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
35 stars 46 forks source link

Handle event streams with 0 events #771

Open danielballan opened 1 year ago

danielballan commented 1 year ago

We have seen this a couple times in the wild, where reading a stream with 0 events raises a ValueError deep inside databroker. See detail box for full traceback.

``` File /srv/conda/envs/notebook/lib/python3.9/site-packages/tiled/adapters/mapping.py:300, in (.0) 294 def _items_slice(self, start, stop, direction): 295 # A goal of this implementation is to avoid iterating over 296 # self._mapping.values() because self._mapping may be a OneShotCachedMap which 297 # only constructs its values at access time. With this in mind, we 298 # identify the key(s) of interest and then only access those values. 299 yield from ( --> 300 (key, self._mapping[key]) 301 for key in self._keys_slice(start, stop, direction) 302 ) File /srv/conda/envs/notebook/lib/python3.9/site-packages/tiled/utils.py:126, in OneShotCachedMap.__getitem__(self, key) 123 v = self.__mapping[key] 124 if isinstance(v, _OneShotCachedMapWrapper): 125 # TODO handle exceptions? --> 126 v = self.__mapping[key] = v.func() 127 return v File /srv/conda/envs/notebook/lib/python3.9/site-packages/databroker/mongo_normalized.py:1482, in MongoAdapter._build_event_stream(self, run_start_uid, stream_name, is_complete) 1466 # We need each of the sub-dicts to have a consistent length. If 1467 # Events are still being added, we need to choose a consistent 1468 # cutoff. If not, we need to know the length anyway. Note that this 1469 # is not the same thing as the number of Event documents in the 1470 # stream because seq_num may be repeated, nonunique. 1471 cursor = self._event_collection.aggregate( 1472 [ 1473 {"$match": {"descriptor": {"$in": event_descriptor_uids}}}, (...) 1480 ] 1481 ) -> 1482 (result,) = cursor 1483 cutoff_seq_num = ( 1484 1 + result["highest_seq_num"] 1485 ) # `1 +` because we use a half-open interval 1486 object_names = event_descriptors[0]["object_keys"] ValueError: not enough values to unpack (expected 1, got 0) ```

Here are list of uids from the NSLS-II PDF databroker that have this issue. (To be precise, they all have some issue and at least one---probably all--have this issue.)

0fd40439-5485-4079-b880-6dc55ece50d9
aa6aef73-08a8-437e-90cb-9633262f0c7d
b2595cad-27dc-4bbe-a1b5-a017837f1b4a
e3da66bb-9365-4c5a-9c59-4dc8d91416eb
818bfc31-dc86-4894-9cd0-c94a7ef345b8
31860664-de1c-4cc7-948e-e2e6a832a77c
3766f792-3ad8-4e67-856c-8bbec78ae44e
2d66c0ea-e33f-4503-8c34-0ade33271c3b
501b18df-28b9-4c88-a7dc-583e73bad8ba
33aaa88e-e4f8-427a-9e9e-3f9765b17f0f
baba4087-2e8d-499b-8be7-9124c713f259
df85aa35-78ad-4412-a258-522305de0f49
ee099c5a-7566-4a17-b21f-4941d5c4d243
b92a2d8f-d2d9-409e-9201-2ad755187192
a7684767-751c-4dfb-a548-139eefb1a08b
12ca844f-dd2c-4d1b-9746-6c8376529e7f
ad934b38-bcc2-4986-9753-edfab8b8ade6
bc500b78-48cf-4103-97a3-44ad2bd78fc5
danielballan commented 1 year ago

There is unit test and fix for this in https://github.com/bluesky/databroker/pull/772.

Let's wait to close this issue until we can test it "in the wild" against one of the examples above.