RoaringBitmap / CRoaring

Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
http://roaringbitmap.org/
Other
1.54k stars 265 forks source link

Avoid calculating the sum of items in bitmap statistics #619

Closed Dr-Emann closed 3 months ago

Dr-Emann commented 4 months ago

Calculating the sum of values requires visiting every item in the bitmap: Other than this, all other statistics can be calculated by only visiting each container. Is it worth either:

  1. Removing the field entirely (breaking change, both API and ABI)
  2. Changing the implementation to leave sum as zero, and documenting that the sum field is not actually updated.

_Originally posted by @AviAvni in https://github.com/RoaringBitmap/CRoaring/pull/617#discussion_r1585936127_

AviAvni commented 4 months ago

we can also provide another api to get a sum just like min and max and it can be optimize for example if we can iterate on ranges for example 1..100 1000..2000 we can just simply calculate (1+100)100/2+(1000+2000)1000/2

lemire commented 4 months ago

I think it would be an acceptable breaking change.

Dr-Emann commented 3 months ago

Done by #624