RoaringBitmap / roaring-rs

A better compressed bitset in Rust
https://docs.rs/roaring/
Apache License 2.0
741 stars 82 forks source link

Hand tune array-bitset aggregates #166

Open saik0 opened 2 years ago

saik0 commented 2 years ago

arXiv:1709.07821 Roaring Bitmaps: Implementation of an Optimized Software Library (Section 4.1.1) (Section 3.2)

Cardinality tracking was added in #127, however we're leaving some perf on the floor.

We also find that if we use bit-manipulation instructions and hand-tuned assembly, we can roughly double the speed at which we can change bits in a bitset, compared to compiler-generated machine code with all optimization flags activated. It is true not only when setting bits, but also when clearing or flipping them—in which cases we need to use the btr and btc instructions instead. Informally, we tested various C compilers and could not find any that could perform nearly as well as hand-tuned assembly code. This may, of course, change in the future.

CRoaring source

Planning

  1. Do we want to implement this optimization? 2x is a huge speed up, but only on x86
  2. The compiler is sufficiently smart to use bts but not sbb, perhaps we can write rust that compiles to the asm we want
  3. If not there's always the asm! macro (nightly only), or if want it in stable then we can write a small C lib for the asm and link it
saik0 commented 2 years ago

inline asm stabilized 🎉