RoaringBitmap / roaring

Roaring bitmaps in Go (golang), used by InfluxDB, Bleve, DataDog
http://roaringbitmap.org/
Apache License 2.0
2.52k stars 230 forks source link

ParAnd should have O(N) complexity where N is the number of containers #110

Open lemire opened 7 years ago

lemire commented 7 years ago

It seems that the current ParAnd implementation has O(N log(B)) complexity where N is the number of containers and B is the number of bitmaps. It should be possible to ensure that the complexity is O(N).

cc @maciej

lemire commented 7 years ago

In particular, if you intersect one empty bitmap with lots of non-empty bitmap, the algorithm should quickly return the empty set, without doing much merging.

Oppen commented 3 years ago

I think something like this should work (I wrote a dummy version but right now it crashes):

Oppen commented 3 years ago

A more sophisticated version would use shotgun search for the current key, as well as skipping to the maximum found from the last iteration. But that's a reasonable draft I think.

Oppen commented 3 years ago

It works but doesn't change a bit 🤷 However, using a pool for the containers slice does decrease allocations and memory use by 7-8% in the benchmarks, without an execution time penalty.

lemire commented 3 years ago

There should never been any need for sorting.

You should never have to visit a key more than once.

Oppen commented 3 years ago

AFAIK you don't visit keys more than once with this approach. The sorting is just to exit before going through them all, but I don't think it makes much of a difference.

Oppen commented 3 years ago

I have an implementation that passes tests. However, I don't think it's the best code in terms of readability and I haven't written the benchmarks yet (there's none for FastAnd). I'll make a draft PR later today, but as of now it shouldn't be merged.

EDIT: I just revived my branch for doing the same for FastAnd and forgot the original one was for the ParAnd function. I can only guess that I wanted to see where the difference comes from. But I think where this will matter most will be when bitmaps have many defined keys where the log part is significant, I'm not sure if our benchmarks cover that case.

Oppen commented 3 years ago

I think it's related enough to mention it tho. This implementation uses the shotgun search approach to skip irrelevant containers.