Chriscbr / thinset

A data structure for sparse sets of unsigned integers.
Apache License 2.0
2 stars 1 forks source link

Fix safety hole, use zeroed memory for set allocations #25

Closed Chriscbr closed 6 months ago

Chriscbr commented 6 months ago

As noted in a code comment, the main "trick" to the algorithm in this crate (also described in this blog) isn't safe in Rust because it requires reading uninitialized memory from one vector and interpreting the bits as indices into the other vector. Most documentation and discussion posts I was able to find online seem to point to this being something that is undefined behavior, meaning the Rust compiler (or LLVM bytecode compiler) might perform invalid optimizations like removing statements etc. because of it.

Since safety is a priority for this library, out of caution, I've updated the code to use zeroed-out vectors instead. This mostly hurts initialization performance, but I think it's still possible to use the data structure in some scenarios. I've updated and reran benchmarks (which previously weren't ran on release mode).

Closes #3

thass0 commented 6 months ago

Did you use a different machine to run the benchmarks that are reported in the README than before? All numbers are so much smaller. If so, the CPU named in the README should also be updated.

Chriscbr commented 6 months ago

Did you use a different machine to run the benchmarks that are reported in the README than before? All numbers are so much smaller

Yep, I ran the tests on --release mode this time (produces more optimized assembly code I believe). I think the relative perf improvement was similar without it though