funkelab / daisy

Block-wise task scheduling for large nD volumes.
MIT License
26 stars 17 forks source link

Make a compound index rather than multikey index #18

Closed pattonw closed 3 years ago

pattonw commented 4 years ago

It seems like multikey indicies behave like a single "axis". This makes it very easy to find all documents that have any coordinate with a range, but when querying individual axes, MongoDb reverts to a slow COLLSCAN.

Solution: This can be fixed by making a compound index, using each axis as a key for a new index. Wierd note: MongoDb seems to revert to a COLLSCAN if no bound is given for the first key in the compound index.

cmalinmayor commented 3 years ago

You should check if dims is required and throw an intelligible error if position is a single value but dims is not passed in right away, before even connecting to mongo. And by you I might mean me, since I see that I am the one that changed it from a default of 3 to None. At the moment, every test passes a single position attribute without passing dims, and then throws a NoneType isn't an int error when trying to make the compound index. So, we'll also have to update the test cases. @pattonw Update: I'm doing this right now. More to come in a minute

pattonw commented 3 years ago

to avoid breaking things we could just default to the original index scheme if position attribute is a single value and dims is not provided. Just throw a warning that it can be very inefficient in some cases and recommend naming each coordinate.

cmalinmayor commented 3 years ago

@pattonw I added that fallback that you suggested, and also switched to checking if the position_attribute is an instance of string, instead of type == list. First, isinstance() is more robust than type() ==, due to subclassing and such. Second, position_attribute could be a tuple or other iterable, so just checking for list could be brittle. Let me know if you think it's good now and I'll merge!

pattonw commented 3 years ago

Yeah I like that. Tests are passing too so all good to merge I think