RoaringBitmap / roaring-rs

A better compressed bitset in Rust
https://docs.rs/roaring/
Apache License 2.0
741 stars 82 forks source link

len should take constant-time #45

Open Nemo157 opened 6 years ago

Nemo157 commented 6 years ago

https://github.com/rust-lang-nursery/api-guidelines/issues/149

josephglanville commented 4 years ago

Contant time len() will probably require additional storage. Atleast for my usecase that would be quite unfortunate as my program has hundreds of thousands of bitmaps at any time.

Perhaps instead the name of len() could be changed to cardinality() or another name so it doesn't conflict with the API guidelines and doesn't bloat RoaringBitmap with extra bytes?

saik0 commented 2 years ago

127 removes 8 bytes from each array container, so maybe we can afford a few extra bytes at the root now ;)

saik0 commented 2 years ago

This is relevant to #167.

Cardinality of all ops can all be trivially implemented in terms of the cardinality of the intersection |A ∩ B|.

|A \ B| = |A| − |A ∩ B|
|A ∪ B| = |A| + |B| − |A ∩ B|
|A △ B| = |A| + |B| − 2|A ∩ B|

Intersection cardinality is cheaper to compute, so it's both simpler to implement and has better runtime performance if we have the len cached.

Dr-Emann commented 2 years ago

Crazy, terrible, stupid idea to avoid increasing the size of RoaringBitmap but still store the cardinality:

The max cardinality of the bitmap is 0x1_0000_0000 (33 bits). but the len and capacity of the root vec should only ever reach 0x1_0000 (17 bits). On 64 bit systems, that gives us plenty of space to store the cardinality in the top bits. On 32 bit systems, that gives us 30 bits (15 bits for capacity/len each). Additionally, because BitmapStore contains a u64, the alignment of Container is at least 8, so we could store the remaining 3 bits of cardinality in the low bits of the vec data pointer.