Open rjkoch opened 5 years ago
Agreed. You are correct that the only way to do this is by a brute-force search. It's been a known issue for awhile, but I don't think we have a GitHub Issue to track it yet. Thank you for opening this one.
Glad I didn't miss an existing issue or some functionality! It seems like the query searches through values associated with the start
key of each Header, but not values associated with the stop
key.
You are correct. The problem is that the start
key and the stop
key can contain colliding names; for example, both always contain a time
key, and other user-defined keys could also collide. If the user searches for db(time=...)
which time do they mean? One option is to guess but if we guess wrong, the user is likely to be very confused by the results. That is was led us to the current approach, searching only on the RunStart document.
Fortunately, there is a path out of this. We are in the process of refactoring databroker on top of intake, an external project with growing adoption in the SciPy/PyData community, and in the process rethinking the interface. (That work is proceeding in #392.) I want to emphasize that we will support backward-compatibility with the current databroker interface for a long time --- we don't want to break users' code! --- but we may gradually encourage users to switch to a new interface in the future.
In the current interface, there is nowhere to put searches on other keys:
db(sample_name='T_series_2') # nowhere to say, search on the 'stop' key
In the new interface, there would be room for this:
db(dict(sample_name='T_series_2'), stop=dict(exit_status='success'))
It would be useful if the db lazy text search could sift out only headers with
'exit_status':'success'
under the'stop'
key, or other lower level metadata.Currently,
db(exit_status='success')
does not return any results.The below creates a useful generator, but I could imagine it would become slow to actually loop through for large runs?
headers = (header for header in db(sample_name='T_series_2') if header['stop']['exit_status'] == 'success')