Capital-Asterisk / longeronpp

"Longeron++" C++17 library for simple memory-efficient or 'data-oriented' structures
MIT License
32 stars 4 forks source link

Optimize HierarchicalBitset #3

Open Capital-Asterisk opened 2 years ago

Capital-Asterisk commented 2 years ago

HierarchicalBitset is intended for fast iteration and performance critical sections. It can be iterated with the take function, or with iterators. Right now, it works, but it's rather branchy and recursive. It hadn't been benchmarked, but it's likely slower than a linear search in some cases.

The take function clears bits during iteration and is also recursive. The iterators use the next function (also recursive) every single iteration, which is a very slow way to iterate the bits of a single block (usually 64 bit ints).

The "count trailing zeros" intrinsic is used to speed up finding the first set bit in a block. This is practical and fast, and the code can be revised to use it more efficiently.

However, there is also SIMD: https://stackoverflow.com/questions/49213611/count-leading-zeros-in-m256i-word https://lemire.me/blog/2018/03/08/iterating-over-set-bits-quickly-simd-edition/

Turning these into complaint iterators does not seem to be a trivial task though.

Capital-Asterisk commented 2 years ago

HierarchicalBitset is slow enough to show up in the profiler. It's probably faster than iterating an std::set, but the goal is to make it faster than iterating a sorted array of integers.