apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

filter_bits under-allocates resulting boolean buffer #6750

Closed gatesn closed 16 hours ago

gatesn commented 3 days ago

Describe the bug

The buffer builder is sized as BooleanBufferBuilder::new(bit_util::ceil(predicate.count, 8)), but the new function takes length in bits already, no need for bit_util::ceil: https://github.com/apache/arrow-rs/blob/b1f5c250ebb6c1252b4e7c51d15b8e77f4c361fa/arrow-buffer/src/builder/boolean.rs#L32

https://github.com/apache/arrow-rs/blob/b1f5c250ebb6c1252b4e7c51d15b8e77f4c361fa/arrow-select/src/filter.rs#L520

Expected behavior Remove bit_util::cel to avoid under-allocating capacity by 8x