Open lemire opened 7 years ago
In particular, if you intersect one empty bitmap with lots of non-empty bitmap, the algorithm should quickly return the empty set, without doing much merging.
I think something like this should work (I wrote a dummy version but right now it crashes):
O(Blog(B)+N)
, but may exit faster.key1
in the first bitmap bm1
:
bmi
other than the first:
bmi
, then we exhausted the input and need to return whatever we aggregated already,.keyi
:
keyi < key1
skip the key, increasing the index for bmi
, then go again.keyi == key1
move on to the next bmi
.key1
will be empty after the and, so just skip it and try with the next key from bm1
.bmi
loop without breaking from it. This means we found a key that's defined in all input bitmaps.
Create a slice of containers for the current indices, pass it to the input channel. Go ahead with the next key1
from bm1
.A more sophisticated version would use shotgun search for the current key, as well as skipping to the maximum found from the last iteration. But that's a reasonable draft I think.
It works but doesn't change a bit 🤷 However, using a pool for the containers slice does decrease allocations and memory use by 7-8% in the benchmarks, without an execution time penalty.
There should never been any need for sorting.
You should never have to visit a key more than once.
AFAIK you don't visit keys more than once with this approach. The sorting is just to exit before going through them all, but I don't think it makes much of a difference.
I have an implementation that passes tests. However, I don't think it's the best code in terms of readability and I haven't written the benchmarks yet (there's none for FastAnd
). I'll make a draft PR later today, but as of now it shouldn't be merged.
EDIT: I just revived my branch for doing the same for FastAnd
and forgot the original one was for the ParAnd
function. I can only guess that I wanted to see where the difference comes from. But I think where this will matter most will be when bitmaps have many defined keys where the log
part is significant, I'm not sure if our benchmarks cover that case.
I think it's related enough to mention it tho. This implementation uses the shotgun search approach to skip irrelevant containers.
It seems that the current ParAnd implementation has O(N log(B)) complexity where N is the number of containers and B is the number of bitmaps. It should be possible to ensure that the complexity is O(N).
cc @maciej