ferrilab / bitvec

A crate for managing memory bit by bit
https://myrrlyn.net/crates/bitvec
MIT License
1.19k stars 112 forks source link

`to_bitvec` and related functions returning invalid values #228

Open XAMPPRocky opened 1 year ago

XAMPPRocky commented 1 year ago

Hello, if you run the below code you can see that while these should return the same value, the second instance returns [0x7, 0x80].

use bitvec::prelude::*;
assert_eq!(bitvec::bits![u8, Msb0;       0, 0, 0, 1, 1, 1, 0, 1].to_bitvec().into_vec(), vec![29]);
assert_eq!(bitvec::bits![u8, Msb0; 1, 1, 0, 0, 0, 1, 1, 1, 0, 1][2..].to_bitvec().into_vec(), vec![29]);
Trolldemorted commented 6 months ago

Is this an endianess problem?

let data: Vec<u8> = vec![0b0000_0000, 0b1111_1111];
let slice = data.view_bits::<Msb0>();
info!("{:04b}", slice[6..10].load::<u64>());

is printing 1100 instead of 0011.

I work around this with a loop (read until word boundary, shift subslice to the left, continue), but that means I have to do sign extension manually.

Bug or a feature?

Trolldemorted commented 6 months ago

Ran your numbers, in your case it does not look like an endianess problem, and rather like a generic "reads across word boundaries" problem. I guess what happens is

@myrrlyn could you clarify what the expected output is? I guess there must be tests covering reads across word boundaries, therefore this behaviour might be intended (?)

dns2utf8-novaziun commented 3 months ago

I have a similar problem, this code returns 2 instead of 4

use bitvec::prelude::*;

let bytes = [2, 0, 0, 0, 0, 0, 0, 0];
let (pos, length) = (6, 3);

let v = bytes.view_bits::<Msb0>();
let s = &v[pos..(pos + length)];
let i: u64 = s.load();

assert_eq!(4, i);

Any pointers on how to fix it? because in the Debug print the slice is correct with 3 elements

Nicceboy commented 3 weeks ago

I think that the main issue here is that we expect the [..index] slicing works differently than it does internally and how it combines with vector conversion.

All cases mentioned here are related to slicing when the index range overlaps with two different bytes in memory.

In the original issue there is

let data = bitvec::bits![u8, Msb0; 1, 1, 0, 0, 0, 1, 1, 1, 0, 1];

If we look how this is presented in memory, it takes two bytes because of the defined u8 type and size of 10. We can see the memory layout with data.domain() call, and it looks like

data.domain() = Domain::<*const u8, bitvec::order::Msb0>::Region {
    head: None,
    body: [
        199,
    ],
    tail: Some(
        PartialElement<*const u8, bitvec::order::Msb0> {
            elem: 64,
            mask: 11000000,
            head: 000,
            tail: 0010,
        },
    ),
}

The first byte (as in terms of body) is used completely (0b_1100_0111) which is the decimal number 199. There is no need for head.

Second byte (tail part) uses two bits and results to remaining (b0100_0000) in memory which is decimal number 64.

If we take a index range with [2..], it results into different memory layout as:

&data[2..].domain() = Domain::<*const u8, bitvec::order::Msb0>::Region {
    head: Some(
        PartialElement<*const u8, bitvec::order::Msb0> {
            elem: 7,
            mask: 00111111,
            head: 010,
            tail: 1000,
        },
    ),
    body: [],
    tail: Some(
        PartialElement<*const u8, bitvec::order::Msb0> {
            elem: 64,
            mask: 11000000,
            head: 000,
            tail: 0010,
        },
    ),
}
// or use bit_domain()

&data[2..].bit_domain() = BitDomain::<*const u8, bitvec::order::Msb0>::Region {
    head: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x0000600001e48010,
        head: 010,
        bits: 6,
    } [
        0,
        0,
        0,
        1,
        1,
        1,
    ],
    body: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x0000600001e48011,
        head: 000,
        bits: 0,
    } [],
    tail: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x0000600001e48011,
        head: 000,
        bits: 2,
    } [
        0,
        1,
    ],

New bitslice is constructed from two different parts - there isn't completely used memory byte anymore so body is empty. [2..] results on removing data from the body and there is a head now which represents the 0b_0000_0111 as two leading 1s are removed, and then replaced with zero padding to align byte width, this results to number 7.

Tail is still the same and remains as the same value. As a result, into_vec call results into [7, 64] array, because it creates u8 aligned vector based on the memory layout.

It also states in the docs:

Converts a bit-vector into a Vec of its underlying storage.

So this is intended behavior I guess.

We can get the expected vector from the above by calling .force_align() after to_bitvec() call which will make the contents of the bit-vector match its in-memory storage.

Now, if we would check the .bit_domain(), it would look like:

bitvec.bit_domain() = BitDomain::<*const u8, bitvec::order::Msb0>::Region {
    head: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x0000000000000001,
        head: 000,
        bits: 0,
    } [],
    body: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x000060000330c020,
        head: 000,
        bits: 8,
    } [
        0,
        0,
        1,
        0,
        1,
        1,
        0,
        1,
    ],
    tail: BitSlice<u8, bitvec::order::Msb0> {
        addr: 0x0000000000000001,
        head: 000,
        bits: 0,
    } [],
}

I am not sure if this is also related to load() usage, but at least in the case of @dns2utf8-novaziun using load_be() results to correct value. load() uses native endianness which is little for most computers these days.