Open Nemo157 opened 6 years ago
Contant time len()
will probably require additional storage. Atleast for my usecase that would be quite unfortunate as my program has hundreds of thousands of bitmaps at any time.
Perhaps instead the name of len()
could be changed to cardinality()
or another name so it doesn't conflict with the API guidelines and doesn't bloat RoaringBitmap
with extra bytes?
This is relevant to #167.
Cardinality of all ops can all be trivially implemented in terms of the cardinality of the intersection |A ∩ B|
.
|A \ B| = |A| − |A ∩ B|
|A ∪ B| = |A| + |B| − |A ∩ B|
|A △ B| = |A| + |B| − 2|A ∩ B|
Intersection cardinality is cheaper to compute, so it's both simpler to implement and has better runtime performance if we have the len cached.
Crazy, terrible, stupid idea to avoid increasing the size of RoaringBitmap
but still store the cardinality:
The max cardinality of the bitmap is 0x1_0000_0000
(33 bits). but the len
and capacity
of the root vec
should only ever reach 0x1_0000
(17 bits). On 64 bit systems, that gives us plenty of space to store the cardinality in the top bits. On 32 bit systems, that gives us 30 bits (15 bits for capacity/len each). Additionally, because BitmapStore
contains a u64, the alignment of Container
is at least 8, so we could store the remaining 3 bits of cardinality in the low bits of the vec data pointer.
https://github.com/rust-lang-nursery/api-guidelines/issues/149