Cydhra / vers

Succinct data structures using very efficient rank and select
Apache License 2.0
62 stars 4 forks source link

bug in iter1() with length >8192 #8

Closed pmarks closed 4 months ago

pmarks commented 4 months ago

Incorrect results from iter1() when length > 8192 and certain bits are present. This is a "local minimum" - if you make the length shorter or make the bit list shorter it will start passing again.

#[test]
fn iter1_test() {
    let input_on_bits = vec![1, 14, 21, 24, 36, 48, 57, 59, 65, 69, 81, 87, 97, 100, 101, 104, 111, 117];

    let mut bv = BitVec::from_zeros(8193);

    for idx in &input_on_bits {
        bv.set(*idx as usize, 1).unwrap();
    }

    let bv = RsVec::from_bit_vec(bv);
    let output_on_bits: Vec<_> = bv.iter1().collect();
    assert_eq!(input_on_bits, output_on_bits);
}
Cydhra commented 4 months ago

God damnit.

Mind sharing your fuzzing code so I can make sure it works before I push the next release? :P

pmarks commented 4 months ago

Sure no problem, here's a cleaned up version:

#[test]
fn iter1_fuzz_test() {

    let mut rng = StdRng::seed_from_u64(42);

    for _ in (0..500).step_by(8) {
        // make random vector length & set of ON bits
        let vec_len: usize = rng.gen_range(0..100000);
        let nnz: usize = if vec_len == 0 { 0 } else { rng.gen_range(0..vec_len) };
        let mut indexes = gen_vec_bounded(&mut rng, nnz, vec_len);
        indexes.sort_unstable();
        indexes.dedup();

        // make bit vector with corresponding on bits
        let mut bv = BitVec::from_zeros(vec_len);

        for idx in &indexes {
            bv.set(*idx, 1).unwrap();
        }
        let bv = RsVec::from_bit_vec(bv);

        // Run iter1() and make sure it matches input ON bits
        let output_on_bits: Vec<_> = bv.iter1().collect();
        assert_eq!(indexes, output_on_bits);
    }
}
pmarks commented 4 months ago

sorry forgot the helper function:

/// Generate a vector of random numbers of length `size`, with values in the range `[0, bound)`
pub fn gen_vec_bounded(
    rng: &mut impl Rng,
    size: usize,
    bound: usize,
) -> Vec<usize> {
    std::iter::repeat(())
        .map(|_| rng.gen_range(0..bound))
        .take(size)
        .collect()
}
Cydhra commented 4 months ago

Fixed in c8d6f79 Preliminary fuzzing didn't find more issues in iter1 and iter0, but I'll wait a bit before I release the patch.

pmarks commented 4 months ago

Thanks! The tests in my larger application are now also working with this change.

Cydhra commented 4 months ago

Yeah. The bugs were actually quite severe, especially the first one. I'm a bit pissed that I apparently did a very bad job at unit testing the new iterator at first. I fuzzed it a bit with your test suite and wrote my own randomized test case now, so I hope it all works as intended now. Actually released the new fix a few seconds ago.