bluesky / databroker

Unified API pulling data from multiple sources
https://blueskyproject.io/databroker
BSD 3-Clause "New" or "Revised" License
35 stars 46 forks source link

lazy search exit_status of databroker headers #418

Open rjkoch opened 5 years ago

rjkoch commented 5 years ago

It would be useful if the db lazy text search could sift out only headers with 'exit_status':'success' under the 'stop' key, or other lower level metadata.

Currently, db(exit_status='success') does not return any results.

The below creates a useful generator, but I could imagine it would become slow to actually loop through for large runs? headers = (header for header in db(sample_name='T_series_2') if header['stop']['exit_status'] == 'success')

danielballan commented 5 years ago

Agreed. You are correct that the only way to do this is by a brute-force search. It's been a known issue for awhile, but I don't think we have a GitHub Issue to track it yet. Thank you for opening this one.

rjkoch commented 5 years ago

Glad I didn't miss an existing issue or some functionality! It seems like the query searches through values associated with the start key of each Header, but not values associated with the stop key.

danielballan commented 5 years ago

You are correct. The problem is that the start key and the stop key can contain colliding names; for example, both always contain a time key, and other user-defined keys could also collide. If the user searches for db(time=...) which time do they mean? One option is to guess but if we guess wrong, the user is likely to be very confused by the results. That is was led us to the current approach, searching only on the RunStart document.

Fortunately, there is a path out of this. We are in the process of refactoring databroker on top of intake, an external project with growing adoption in the SciPy/PyData community, and in the process rethinking the interface. (That work is proceeding in #392.) I want to emphasize that we will support backward-compatibility with the current databroker interface for a long time --- we don't want to break users' code! --- but we may gradually encourage users to switch to a new interface in the future.

In the current interface, there is nowhere to put searches on other keys:

db(sample_name='T_series_2')  # nowhere to say, search on the 'stop' key

In the new interface, there would be room for this:

db(dict(sample_name='T_series_2'), stop=dict(exit_status='success'))

See https://github.com/bluesky/intake-bluesky/issues/63