RoaringBitmap / CRoaring

Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
http://roaringbitmap.org/
Other
1.57k stars 269 forks source link

Compute cardinality of multiple unions or intersections at a time #298

Open Oppen opened 3 years ago

Oppen commented 3 years ago

I'm planning on implementing a feature to do this but I'd like early feedback on the design.

The idea is to use a heap such as in the roaring_bitmap_or_many_heap implementation to get pointers to all the containers sorted by key. This way I can compute unions of all containers of a given key in a single bitset container (since the contents will be discarded once I get its cardinality), so the memory use would be about 8k + sizeof(auxilliary_struct)*N, where N is the number of containers in the bitmaps passed to my function, and auxilliary_struct would be some structure with enough data to keep progress (most likely a pointer to a container, but I'll know better when I start coding). Another option is for auxiliary_struct to keep the lowest unprocessed key for sorting and the pointer to the bitmap, so our heap doesn't need to fit all the containers at once but keeps re-adding elements to the queue if they weren't exhausted. Time complexity should be more or less the same as with an intermediate bitmap for union, but for bigger bitmaps it should save some space.

The same can be done for intersection.

lemire commented 3 years ago

I guess that you want the cardinality of the union of B1, B2 and B3 together... You could do it without materialization: just iterate through the values in the containers. It is going to be in time loglinear with respect to (card(B1) + card (B2) + card(B3)). But an iterator approach is going to be slow (high constant) so, in practice, materializing intermediate results will be faster. However, you can always do it with, at most, one materialized container per key... especially in lazy mode where you tend to materialize bitmap containers. With enough care, you can reuse the same bitmap containers. It is entirely reasonable.

I wonder if we implemented something of the sort... I will check and report back if we did.

Intersections are another beast. You stand to gain much less by being clever... because the intersection between many bitmaps tends to be tiny compared to the original bitmaps (of course, it is data dependent...).

lemire commented 3 years ago

I can't find an implementation right (in Java or Go).

It is entirely reasonable.

Oppen commented 3 years ago

Indeed, that was my reasoning. I didn't think through the intersection part being tinier. I guess worst case is a set of dense intersections where the resulting bitmap may still be big, but deciding whether that's likely would require domain knowledge I think. In the general case, I guess the lazy intersection should save enough memory without sacrificing performance, right?

lemire commented 3 years ago

I think that the motivation for the fast intersection function should be checked with benchmarks.