glotzerlab / signac

Manage large and heterogeneous data spaces on the file system.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
129 stars 36 forks source link

Potential improvements for _SearchIndexer #741

Open vyasr opened 2 years ago

vyasr commented 2 years ago

683 replaced the old Collection class with a stripped down _SearchIndexer that does just enough for signac's internal use cases. However, that PR left a few tasks outstanding that we should consider:

bdice commented 2 years ago

These are good ideas. Since all this is internal, this work can be done post-2.0.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vyasr commented 1 year ago

FWIW just tried running the test suite with the normalization removed, and everything passed. I also tried running it throwing an error whenever the normalization changed the input, and the only difference I observed was a conversion from tuples to lists. In terms of performance, the normalization costs a few microseconds for a trivial filter (e.g.{'a': [1]}) rising to 10s of microseconds for filters including more inputs.