bluesky / bluesky

experiment orchestration and data acquisition
https://blueskyproject.io/bluesky/
BSD 3-Clause "New" or "Revised" License
159 stars 92 forks source link

packing of invalid documents #1307

Open gwbischof opened 4 years ago

gwbischof commented 4 years ago

A diff between reading with databroker v1 and v0.

2020-02-24 10:35:21,858 INFO CSX 21d19383-e171-44fd-9efa-a8a729f84226 21d19383-e171-44fd-9efa-a8a729f84226 [('change', [('datum', '50838fb5-d705-4559-8233-3166b0cddbb1'), 'datum_kwargs', 'frame'], (1, 0)), ('add', [('datum', '06287239-0d40-4c96-84ca-ed1df501b565'), 'datum_kwargs'], [('frame', 1), ('channel', 1)]), ('add', [('datum', '156188f3-fffb-48f3-8845-102b91e36a53'), 'datum_kwargs'], [('frame', 2), ('channel', 1)])]

The raw documents:

{'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81', 'datum_id': ['04037aa0-5cd7-46be-97b6-30e9832d43f6', '50838fb5-d705-4559-8233-3166b0cddbb1', 'c82ba303-fb19-4c55-9437-bc31b016f34d', '06287239-0d40-4c96-84ca-ed1df501b565', '75c201f7-e0f3-4810-bfab-5c269e8f9234'], 'datum_kwargs': {'frame': [0, 1, 2], 'channel': [1, 1, 1]}}

In [26]: list(db.datum.find({'datum_id': '04037aa0-5cd7-46be-97b6-30e9832d43f6'}))                         
Out[26]: 
[{'_id': ObjectId('5c4f6a607368e3d06a2cc6ef'),
  'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81',
  'datum_id': '04037aa0-5cd7-46be-97b6-30e9832d43f6',
  'datum_kwargs': {}}]

In [27]: list(db.datum.find({'datum_id': '50838fb5-d705-4559-8233-3166b0cddbb1'}))                         
Out[27]: 
[{'_id': ObjectId('5c4f6a617368e3d06a2cc6f0'),
  'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81',
  'datum_id': '50838fb5-d705-4559-8233-3166b0cddbb1',
  'datum_kwargs': {'frame': 0, 'channel': 1}}]

In [28]: list(db.datum.find({'datum_id': 'c82ba303-fb19-4c55-9437-bc31b016f34d'}))                         
Out[28]: 
[{'_id': ObjectId('5c4f6a627368e3d06a2cc6f3'),
  'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81',
  'datum_id': 'c82ba303-fb19-4c55-9437-bc31b016f34d',
  'datum_kwargs': {}}]

In [29]: list(db.datum.find({'datum_id': '06287239-0d40-4c96-84ca-ed1df501b565'}))                         
Out[29]: 
[{'_id': ObjectId('5c4f6a637368e3d06a2cc6f4'),
  'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81',
  'datum_id': '06287239-0d40-4c96-84ca-ed1df501b565',
  'datum_kwargs': {'frame': 1, 'channel': 1}}]

In [30]: list(db.datum.find({'datum_id': '75c201f7-e0f3-4810-bfab-5c269e8f9234'}))                         
Out[30]: 
[{'_id': ObjectId('5c4f6a637368e3d06a2cc6f6'),
  'resource': '56370eb6-3466-47c7-9cb0-bb21ced05f81',
  'datum_id': '75c201f7-e0f3-4810-bfab-5c269e8f9234',
  'datum_kwargs': {}}]
danielballan commented 4 years ago

To highlight the problem here, the datum_kwargs for Datum documents that point to the same Resource are supposed to have consistent keys, just as Event documents that point to the same Event Descriptor are supposed to have consistent data keys.

tacaswell commented 4 years ago

When is this data from and are we sure it is not broken stuff from development?

danielballan commented 4 years ago

January 2019 at CSX, IRCC (@gwbischof shouod confirm). Not so old that it should be broken. It could be the result of hand-inserting documents after an interrupted scan, which we have done a handful of times there and may have done incorrectly.

stan-dot commented 2 months ago

import databroker
>>> databroker.__version__
'1.2.5'

this does not belong to the bugs often reported. given the current version of the databroker dependency, can we close this issue @danielballan ?