The current in-memory representation of the run iterator is really inefficient. On a 64 bit system, it'll take 16 bytes per run due to alignment. This is an issue because we cache this iterator whenever we first try to read a bitfield.
We can cut this in half by:
First combining (or failing) on adjacent runs with the same value.
Encoding the runs as a slice of 64 bit integers where the value if 0 elements are even, and the value of odd elements are 1.
We can make this even smaller if we're willing to use a variable-length integer encoding (or simply avoid caching and get better at decoding in-place).
The current in-memory representation of the run iterator is really inefficient. On a 64 bit system, it'll take 16 bytes per run due to alignment. This is an issue because we cache this iterator whenever we first try to read a bitfield.
We can cut this in half by:
We can make this even smaller if we're willing to use a variable-length integer encoding (or simply avoid caching and get better at decoding in-place).