Devlyn-Nelson / Bondrewd

Rust proc_macro library for bit level field packing.

7 stars 2 forks source link

Wrong (?) representation of little endian fields with bit_length not multiple of 8 #12

Open oddlama opened 7 months ago

oddlama commented 7 months ago

Hi, thanks for the awesome crate! Unfortunately I ran into a problem when trying to parse a little endian number. Imagine I want to parse the following data packet using bondrewd:

Furthermore, assume that reserved is always 0, then this looks just like a normal little-endian u16 field. So this can be correctly parsed by using

#[derive(bondrewd::Bitfields, Clone, Default, PartialEq, Eq)]
#[bondrewd(read_from = "msb0", default_endianness = "le", enforce_bytes = 2)]
pub struct Packet {
    pub number: u16,
}

Now in reality reserved is not guaranteed to be read as 0 and could be anything. So I have to reduce the number of bits parsed to exactly 9. But when I set bit_length = 9 attribute, bondrewd will obviously use consecutive bits, which is wrong for what I want to do.

#[derive(bondrewd::Bitfields, Clone, Default, PartialEq, Eq)]
#[bondrewd(read_from = "msb0", default_endianness = "le", enforce_bytes = 2)]
pub struct Packet {
    #[bondrewd(bit_length = 7, reserve)]
    #[allow(dead_code)]
    pub reserved: u8,
    #[bondrewd(bit_length = 9)]
    pub number: u16,
}

So in the graphic, bit 7 of register address 0x16 would be used as the most significant bit of the number, instead of bit 0. But this is not what I want. Is there a way to get the aligned little-endian parsing behavior for numbers that have a bit_length that is not a multiple of 8?

Tupelov commented 5 months ago

Would using 'lsb0' resolve this issue?

oddlama commented 5 months ago

I've empirically tried all combinations of msb0/lsb0 and endianess but wasn't able to find a configuration that exhibits the desired behavior. I think the problem is mainly that if you were to receive the data into an array, the desired field is not contiguous in memory.

Devlyn-Nelson commented 5 months ago

sorry for not seeing this until now. i have been work on some other stuff, but this does sound like a problem so i will be adressing it soon.

oddlama commented 5 months ago

No worries, I appreciate it

Devlyn-Nelson commented 5 months ago

I'm not sure i completely understand what is wrong yet. But I do want to make sure me and you are on the same page because i could have misinterpreted something with developing this.

Little Endian means the least-significant byte (small number) is first and the most-significant number (big number) is last. Big Endian means the least-significant byte (small number) is last and the most-significant number (big number) is first.

read_from only effects where the bits are in the byte stream, not how they are interpreted.

`read_from = "msb0"` means that a value of 1 in number would look like: `[0b0000_0000, 0b0000_0001]`	reserve	number
0000000	000000001

`read_from = "lsb0"` means that a value of 1 in number would look like: `[0b0000_0000, 0b1000_0000]`	number	reserve
000000001	0000000

Bondrewd does not have any feature that allows users to re-order the bits of a field, only the direction bytes are pulled from first for an entire structure.

I have verified the structure you provided works as I would expect.

// The 1s here mark the `reserve` field.
let test_bytes = [0b1111_1110, 0b0000_0000];
let packet = Packet::from_bytes(test_bytes);

assert_eq!(packet.reserved, 0);
assert_eq!(packet.number, 0);

// The 1s here mark the `number` field and we should get the maximum value possible.
let test_bytes = [0b0000_0001, 0b1111_1111];
let packet = Packet::from_bytes(test_bytes);

assert_eq!(packet.reserved, 0);
assert_eq!(packet.number, 511);

// `number` most-significant bit (first bit in the last byte, last byte only has 1 bit so...). Little
// endian stores small end first, So putting a single 1 at this bit position should give `number` a value of 256.
let test_bytes = [0b0000_0000, 0b0000_0001];
let packet = Packet::from_bytes(test_bytes);

assert_eq!(packet.reserved, 0);
assert_eq!(packet.number, 256);

// `number` most-significant bit of first byte (first bit in the first byte). filling only this bit
// should give `number` a value of 128.
let test_bytes = [0b0000_0001, 0b0000_0000];
let packet = Packet::from_bytes(test_bytes);

assert_eq!(packet.reserved, 0);
assert_eq!(packet.number, 128);

// `number` least-significant bit. filling least-significant bit means value of 1.
let test_bytes = [0b0000_0000, 0b0000_0010];
let packet = Packet::from_bytes(test_bytes);

assert_eq!(packet.reserved, 0);
assert_eq!(packet.number, 1);

All of this being said, I am not great at terminology and my docs are far from perfect. I would be more than happy to continue on and see if Bondrewd is using a term incorrectly, has a missing a feature, or needs more clear documentation.

oddlama commented 5 months ago

I agree, the example should produce exactly what you tested for. I also agree with most of your terminology. The bad example was my fault, I provided it as a sort of "closest thing" I would be able to represent, but it just wasn't a good example. I made a drawing so it is easier to understand what I mean:

The top case shows the two ways how a u16 can be interpreted when using "le" and "be". No surprises there I hope. msb0 or lsb0 should not make a difference here if I understand them correctly.
In the bottom left you can see the current interpretation of my bad example, and I agree it is a valid interpretation.
The bottom u13 example is more interesting. There are 4 possible ways to interpret this, and only two are representable.

The issue is that endianness is (as far as I know) only defined for k-words, i.e. anything that has a length that is an integer multiple of one byte. The specified ordering of bytes only really makes sense for 8,16,24,32 bit and so on. It now ultimately comes down to how you interpret endianness in the context of off-length words like 9 or 13 bits.

I would argue that case A makes a lot of sense, because it is just the logical extension of big endian to n bits. It's also often needed for parsing embedded stuff, and it is already representable.
Case D is what currently happens when you specify little endian on an off-length field. It is weird, because little endian refers to 8-bit bytes and the bit numbers are no longer aligned with the general case. But it might have a purpose for some applications that I currently don't see.
Case C is nonsensical IMO, it's neither aligned nor contiguous. Not sure why anyone would want that.
Case B is what I want. It is what I would expect to get when I say "the next 16 bits store a 13-bit little endian unsigned integer".

I think B is the logical generalization for off-length little-endian fields, and not case D. I think so because the word "little-endian" implies the use of 8-bit bytes (to me), and more importantly I would expect a hypothetical u13 and u12 to share a common memory representation for the first 12 bits, just like u8 and u16 share the first 8. The high level concept of integers is usually agnostic of the underlying memory order. The current little endian behavior in D breaks this assumption.

Nonetheless, I think bondrewd isn't wrong in what it is doing. It's just unexpected for the little endian case because I cannot think of any protocol that would make use of this. So I would have expected this to either assume 8-bit offsets as in B or to raise a compile time error instead of accepting the weird bit packing.

Supporting all 4 cases would theoretically also be possible, by adding a new parameter to specify a kind of "used bit" length together to the total bit length. If the intermediate reserved stuff is used for other fields, one can still overlay a second struct on top of the integer.

#[derive(bondrewd::Bitfields, Clone, Default, PartialEq, Eq)]
#[bondrewd(read_from = "msb0", default_endianness = "le", enforce_bytes = 2)]
pub struct Packet {
    #[bondrewd(integer_length = 9)] // bit_length is still 16 but only 9 would be used in correct alignment and order.
    pub number: u16,
}

Unrelated: I may also be a little confused about msb0 and lsb0 for the same reasons. Does it just reverse the bits before reading? Or does it only reverse read order of struct members? Or does it also assume there is an underlying 8-bit structure to reverse only within one byte? The documentation here unfortunately only uses a single byte and the number 5 which is 0b101 which is the same when read backwards. The number 6 would illustrate the real behavior better.

let test_msb = ExampleMSB {
    one: 0,
    two: 6,
    three: 0,
};
let test_lsb = ExampleLSB {
    one: 0,
    two: 6,
    three: 0,
};
// in msb0 field one is the first 2 bits followed by field two
// then field three is the last 3 bits.
assert_eq!(test_msb.into_bytes(), [0b00_110_000]);
// in msb0 field three is the first 3 bits followed by field
// 2 then field one being the last 2 bits
assert_eq!(test_lsb.into_bytes(), [0b000_110_00]);

So indeed there seems to be no reversing going on, but is this reversed indexing within their respective 8-bits or within the whole structure?

Devlyn-Nelson commented 5 months ago

I trying to leave work right now sorry if this is unintelligible.

Btw Nice pictures I'm too lazy to learn to anything other than markdown.

As for the read_from = "lsb0" it reverses the read order of struct members. I believe i was trying to keep things similar to the packed_struct crate because that is what i used before.

As for the other thing. Typically with bit level packets in communications, to my knowledge working with other companies that we interface with, we do not have "holes" in our bitfields. Typically bitfields are used to make the packet smaller by determining the largest necessary values per field and limiting the amount of bits for those fields, having holes in this data would take away from this.

I would also like to briefly mention how I use Bondrewd Bitfields with Little Endian at my work for space stuff, The lack of holes in our data is likely more apparent because in aerospace we have to account for a lot of interference in wireless communications. The Wireless Interference is why we need to reduce overall packet sizes in the first place, smaller packets typically means less chances for a bits to become wrong due to interference. I am not sure if there are any standards for what you are trying to do, but i am curious.

Would you mind explaining what the reason for wanting this feature is, currently im assuming you are doing cool register stuff which bondrewd has not been designed for, but could be.

I think the bottom line is Bondrewd doesn't care about byte-alignment when in bitfield form. I have been doing a lot of work to make bondrewd easier to work on and add features to. I can think of a couple ways to accomplish what you are looking for, but the biggest issue is none of this wouldn't benefit me at all, so i can't promise expedience.

Currently i could look into making:

an alignment system which allows for automatic alinement. This would be super cool, but i think it might be a lot of work.
a system that allows for a field that overflows into another byte without filling it, to alliance to one side and assume anything between is useless. i think this could be easy.
allowing you to specify sections of bits to use for a number. this sounds crazy, i don't know how hard this would be.

oddlama commented 5 months ago

As for the read_from = "lsb0" it reverses the read order of struct members. I believe i was trying to keep things similar to the packed_struct crate because that is what i used before.

Ah that makes sense. I guess personally I would have liked a more descriptive param like read_order = "reversed", but that's not really too important.

As for the other thing. Typically with bit level packets in communications, to my knowledge working with other companies that we interface with, we do not have "holes" in our bitfields. Typically bitfields are used to make the packet smaller by determining the largest necessary values per field and limiting the amount of bits for those fields, having holes in this data would take away from this.

Definitely. I'm coming at this more from an embedded environment where you usually have to transfer full bytes via I2C or SPI anyway, so the hole doesn't add or remove any overhead.

I would also like to briefly mention how I use Bondrewd Bitfields with Little Endian at my work for space stuff, The lack of holes in our data is likely more apparent because in aerospace we have to account for a lot of interference in wireless communications. The Wireless Interference is why we need to reduce overall packet sizes in the first place, smaller packets typically means less chances for a bits to become wrong due to interference. I am not sure if there are any standards for what you are trying to do, but i am curious.

Makes total sense, I didn't mean to imply that doing it that way is "worse" than doing it any other way. I just wasn't aware of any protocol doing little endian "split and pack" this way. Interesting stuff definitely.

Would you mind explaining what the reason for wanting this feature is, currently im assuming you are doing cool register stuff which bondrewd has not been designed for, but could be.

I've been trying to make it easier to write embedded device drivers. Defining the I2C/SPI protocol is 90% of the work, and Bondrewd lends itself really nicely to that task. I haven't found anything remotely comparable for defining bitfields (especially with enum support), i think you've done really great work! There are some things that would make it even easier for embedded stuff specifically, especially regarding memory ownership. I tried to bolt it on top with medium success, but having two structs for each register (and the implicit renaming) is a bit awkward.

I was working on embedded-devices when I first came across the problem with the Bosch BMP390, some temperature sensor. The first image in this issue came from their datasheet. In Section 5.2.2 you can see that the bit transfer order is 7 6 5 4 3 2 1 0, so the memory for the fifo_watermark field (4.2, page 30) definitely contains a hole:

(Mind the memory addresses are descending in this representation, I swapped it for readability in the first post. Bit numbers here are 7 to 0 from left to right.)

Most other devices I came across did use big endian, but some apparently don't. In their sample C code they just cast the 2 bytes to an integer and mask out the reserved bits.

I can think of a couple ways to accomplish what you are looking for, but the biggest issue is none of this wouldn't benefit me at all, so i can't promise expedience.

Don't worry about it too much. I haven't thought about the possible solutions enough at this point to make a real recommendation, but I think these are all valid propositions. There will always be some kind of tradeoff between usability and what crazy stuff you can do with a feature.

I'd definitely prefer something that allows this more common case to be easily represented. Using an overflow is probably easy but will also introduce some weirdness, because you will have to make sure to put a reserved field at the correct position later to skip the bits that are actually used. So you'll have two or three fields for each such integer which might be awkward. I do think an integer-specific parameter that doesn't allow the reserved fields to be used in any other way would be a simple addition, but I could be mistaken. I don't really know how your implementation works :P

I am not sure what Bondrewd's policy is on breaking changes, I definitely wouldn't mind testing something that is simple to implement now but could later be removed for a more powerful syntax. And of course I understand that this isn't a priority for you. This really isn't a big issue either, just a minor inconvenience that I can live with.

Devlyn-Nelson commented 5 months ago

Thanks for all of the time you put into this recommendation. I'm am going to try something here in the next couple weeks because I am finishing a huge overhaul anyway.

As for breaking changes, I have not released a 1.0.0 version yet so I believe I am allowed a little more flexibility. I try to follow this when it comes to versioning. As an example, if i wanted to change read_from in a breaking way, I could either:

Throw an error when using one of the deprecated values, which informs you of new value you should use from now on.
Deprecate read_from and, again, throw compiler error describing what you should be using in the future.

anyway I'm not to worried about that, yet.

The infallible-enums combined with little-endian-support are huge factors in this crate being made in the first place, so I'm overjoyed to hear you like it. As further example of how I might handle large scale changes in Bondrewd, I hope to be pushing a new overhaul of Bondrewd Enums that I have been using for... Oh my... a year. The Goal of this overhaul was to merge Bitfields and BitfieldEnums into one and allow users to define differing data sets in each variant. The Changes add functionally but renames a few things. You should be able to easily migrate, but just in case I will not be removing the current enum support, only deprecating it. I will be writing a migration guide, which for the most part just removes enum_primitive = "u8" from the field atrributes and adds id_bit_length = 8 to the Enum attributes . If your interested I would love to hear opinions on bondrewd = "0.2" and bondrewd-derive = "0.5" in the next couple weeks as i get closer to finishing the refactor I am currently working on.

oddlama commented 5 months ago

Throw an error when using one of the deprecated values, which informs you of new value you should use from now on.

Deprecate read_from and, again, throw compiler error describing what you should be using in the future.

anyway I'm not to worried about that, yet.

Very nice, both sound like good options.

I will be writing a migration guide, which for the most part just removes enum_primitive = "u8" from the field atrributes and adds id_bit_length = 8 to the Enum attributes . If your interested I would love to hear opinions on bondrewd = "0.2" and bondrewd-derive = "0.5" in the next couple weeks as i get closer to finishing the refactor I am currently working on.

I am definitely interested and will gladly try it out once it's available! Refactoring won't be a problem at all for me, the stuff is experimental anyways. Very much looking forward to it!

Devlyn-Nelson commented 4 months ago

Hi, been gone for a while doing FRC mentoring, but I'm back to work. So i am close to a new release, but before i make it public i would like to get some information about how fields interact. Currently we have talked very little about how multiple fields reserve bits or bytes in different situations.

Please help me understand what the proper output of this should be.

#[derive(Bitfields, Clone, Debug, PartialEq)]
#[bondrewd(default_endianness = "le")]
struct SimpleExample {
    #[bondrewd(bit_length = 4)]
    a: u8,
    #[bondrewd(bit_length = 10)]
    b: u16,
    #[bondrewd(bit_length = 2)]
    c: u8,
}

currently output would be bytes = [0bAAAABBBB, 0bBBBBBBCC]. but i believe you would want bytes = [0bAAAABBBB, 0bCCBBBBBB]. or possibly bytes = [0bAAAABBBB, 0b00BBBBBB, 0bCC000000] (this one contains 8 unused bits).

oddlama commented 4 months ago

Hi, been gone for a while doing FRC mentoring, but I'm back to work. So i am close to a new release, but before i make it public i would like to get some information about how fields interact. Currently we have talked very little about how multiple fields reserve bits or bytes in different situations.

Glad to hear you're back, hope you had fun!

Please help me understand what the proper output of this should be. [...] currently output would be bytes = [0bAAAABBBB, 0bBBBBBBCC]. but i believe you would want bytes = [0bAAAABBBB, 0bCCBBBBBB]. or possibly bytes = [0bAAAABBBB, 0b00BBBBBB, 0bCC000000] (this one contains 8 unused bits).

It's so hard to describe what I am thinking 😅 Do you have a matrix/discord channel or something like that where we can discuss this better? I'll try anyway:

I would actually say this is either invalid or none of the above. I'm leaning towards saying this should not compile. My thoughts are the following:

Specifying LE on a field should impose slicing every 8 bits, either for the whole structure or within the member. Probably within the member field only, since the a field should not impose limits on the whole structure.
As established previously there kind of exists a canonical little endian format, which says: If you are looking at any bit with value 2^i in a little endian field and then look 8 bits to the right, it must either be outside of the structure OR represent the bit with value 2^{i+8}.
Specifying a bit_length should IMO result in that field requiring exactly that many bits, not more not less. Otherwise it's hard to miss situations where the field suddenly coalesces with other nearby fields or needs more bytes due to padding. A little endian field with bit_length = 12 would have 4 padding bits. So a following field with bit_length 4 would take up the space, a field with bit_length 5 would not. That would be weird.
I personally still might want to fill the padded bytes, but explicity. So I think this should require manually overlaying a structure for now, so that the intent stays clear.

So with that in mind let's have a look at the examples:

Example 1

currently output would be bytes = [0bAAAABBBB, 0bBBBBBBCC]

Yep, should look like this and violates the little endian constraint. Since you seem to be needing this, I would refer to it as condensed little endian, where all the theoretical required padding is dropped.

Example 3

or possibly bytes = [0bAAAABBBB, 0b00BBBBBB, 0bCC000000] (this one contains 8 unused bits).

Let's move to example 3 because handling this first makes more sense. This looks like it adds padding in the parent structure at the 8 bit boundary, but doesn't produce valid little endian. I think the relative position of the member in parent structure should not influence padding decisions. The user has more freedom to align this manually.

Example 2

but i believe you would want bytes = [0bAAAABBBB, 0bCCBBBBBB].

This is Example 3 but with C overlayed into the padding. Same conceptual problems as above.

Example 4 (true little endian)

Assuming that all fields are true little endian and not condensed little endian, I would expect this:

Which would be written as

#[derive(Bitfields, Clone, Debug, PartialEq)]
#[bondrewd(default_endianness = "le")]
struct SimpleExample {
    #[bondrewd(bit_length =  8, integer_length =  4)] a: u8,
    #[bondrewd(bit_length = 16, integer_length = 10)] b: u16,
    #[bondrewd(bit_length =  8, integer_length =  2)] c: u8,
}

// Or even in the simpler form:

#[derive(Bitfields, Clone, Debug, PartialEq)]
#[bondrewd(default_endianness = "le")]
struct SimpleExample {
    #[bondrewd(integer_length =  4)] a: u8,
    #[bondrewd(integer_length = 10)] b: u16,
    #[bondrewd(integer_length =  2)] c: u8,
}

Example 5 (only b true little endian)

Let's take a step back and only consider that the 10-bit field b is true little endian and the others are either big endian or condensed little endian (which makes no difference because both are <= 8 bits). We can redraw this as a 4 bit field which is just plain concatenated to the true 16bit width little ending field plus the 2 bit field.

This violates bit_length = 10 and should thus error at compile time. To make it work like shown here, it should require bit_length = 16, integer_length = 10, or simply integer_length = 10 since it is a u16 already.

Example 6 (only b true little endian, overlapping)

Same as Example 5 but with overlapping. Should thus also require bit_length = 16, integer_length = 10, and it should require putting c: u8 into a struct like below and putting this struct into the above one instead of c, with the overlapping_bits attribute:

#[derive(Bitfields, Clone, Debug, PartialEq)]
#[bondrewd(default_endianness = "be")]
struct COverlay {
    _padding: u8,
    #[bondrewd(bit_length = 2)]
    c: u8,
}

#[derive(Bitfields, Clone, Debug, PartialEq)]
#[bondrewd(default_endianness = "be")]
struct SimpleExample {
    #[bondrewd(integer_length =  4)] a: u8,
    #[bondrewd(integer_length = 10, endianness = "le")] b: u16,
    #[bondrewd(overlapping_bits =  16)] c: COverlay,
}

Devlyn-Nelson commented 4 months ago

I think i have discord, but i don't talk to anybody online outside of programming sites. But if i understand correctly, for your use case fields in little endian should never share space in a byte. If that is true i was considering a set of isolation attributes. Just to confirm "Example 4 (true little endian)" is correct and the desired result?

oddlama commented 4 months ago

I think i have discord, but i don't talk to anybody online outside of programming sites.

No worries, we can continue here

But if i understand correctly, for your use case fields in little endian should never share space in a byte. If that is true i was considering a set of isolation attributes. Just to confirm "Example 4 (true little endian)" is correct and the desired result?

Yes, in my opinion at least :D I'm open to consistent alternative interpretations!

Also i would require the bit_length is increased to 8/16/8 here and have another attribute for the integer length. If the bit length would stay 4/10/2 then I would expect an error. Although I guess this could be solved in multiple ways.

Devlyn-Nelson commented 4 months ago

So, i personally don't like to require too many attributes. I am also not happy with how i managed the read_from attribute, so i am already considering redefining attributes pertaining to bit/byte direction. For me it make sense to change read_from to no longer take an endianness but rather first-byte and last-byte options (i don't think this effects you at all). Then i could add a 3rd endianness option for field-endianness and default-endianness called "aligned little endian". The aligned little endian would simply require bondrewd to enforce "correct" little endian formatting. I would little to explain that i put correct in quotes because the current way we handle things is "correct" for communications and packets BUT NOT for actual use in mathematical operations or human output and must be transformed into the standard usable format which is what you have been referring to as correct.

by using ale (aligned little endian) to get our desired output the example would become

#[derive(Bitfields)]
#[bondrewd(default_endianness = "ale")]
struct SimpleExample {
    #[bondrewd(bit_length = 4)]
    a: u8,
    #[bondrewd(bit_length = 10)]
    b: u16,
    #[bondrewd(bit_length = 2)]
    c: u8,
}

This would output bytes = [0b0000AAAA, 0bBBBBBBBB, 0b000000BB, 0b000000CC].

If this seems wrong please say so, i sometimes oversimplify things to the point where they can't be understood. I am looking at this in terms of flexibility and versatility. I would rather have more options than close off entire methodologies. It is also important to consider the current way that little endian is handled MUST remain and be considered valid because it is a standard in aerospace hardware, which we use in our satellites.

oddlama commented 4 months ago

So, i personally don't like to require too many attributes. I am also not happy with how i managed the read_from attribute, so i am already considering redefining attributes pertaining to bit/byte direction. For me it make sense to change read_from to no longer take an endianness but rather first-byte and last-byte options (i don't think this effects you at all).

I would actually love that, it would express the actual functionality much more clearly!

This would output bytes = [0b0000AAAA, 0bBBBBBBBB, 0b000000BB, 0b000000CC]. If this seems wrong please say so, i sometimes oversimplify things to the point where they can't be understood. I am looking at this in terms of flexibility and versatility.

Not at all, the output is exactly what I'd expect. I shouldn't have referred to it as "correct" previously, since little endian isn't defined for sub-bytes anyway. To me it is just the mathematical extension to little endian.

The only "disadvantage" I can see with not introducing a new attribute for the integer length is that you disallow configurations that would want to use aligned little endian with a field size that isn't a multiple of 8 (e.g. 12 bit aligned little endian integer in a 14 bit field, so 2 bytes padding). But I would also say that if someone uses aligned little endian then it is exactly because it should be a multiple of 8 so thinking about it I would also leave this feature out.

I would rather have more options than close off entire methodologies. It is also important to consider the current way that little endian is handled MUST remain and be considered valid because it is a standard in aerospace hardware, which we use in our satellites.

Naturally! At first I just didn't know there existed applications that use that. It's a clearly defined rule and evidently it makes sense for some applications, so it should get a mode.

I just have two philosophical question remaining:

Do you want to introduce an aligned big endian that does the same automatic padding to the right? It's not necessary at all for big endian though, since a user could just trivially add padding bytes themselves.
I'm not sure whether any of the two options should be named le, as it implies there is a canonical default behavior. I personally would like them to be explicitly named, so maybe ple (packed little endian) and ale (aligned little endian), then the users would have to think about which one they want. But that's totally up to you.

Thank you for all the thoughts you are putting into this!

Devlyn-Nelson commented 4 months ago

Your 2 questions have defiantly been rattling around in my head.

Currently i am moving forward with the intention that i will NOT implement aligned-big-endian unless someone quickly makes it known to me that it is desired, or the implementation of aligned-little-endian takes less than 4 hours for me to finish once i actually start writing code. I am currently under the assumption this will only take me 1 day, tomorrow if nothing comes up.

I defiantly agree about removing le entirely; currently planning on a compiler error that tells users to choose between ple or ale.

oddlama commented 4 months ago

Sounds awesome, very much looking forward to it!

Devlyn-Nelson commented 4 months ago

So sometimes I realize I don't even know how to use my own libraries. Can you review this code and perhaps try it, because i think it may already be possible to accomplish, and i could just make it easier to understand and use.

I was setting up a test to try and get your stuff working, but I wanted to play around and refresh my own knowledge about the systems currently in place (trying to look for side effects i could exploit) and I ended up with this code which is currently in the aligned example in my branch complex-enums.

use bondrewd::Bitfields;

#[derive(Bitfields, Clone)]
#[bondrewd(default_endianness = "le")]
struct Packed {
    #[bondrewd(bit_length = 9)]
    number: u16,
    #[bondrewd(bit_length = 7, reserve)]
    reserve: u16,
}

#[derive(Bitfields)]
#[bondrewd(default_endianness = "be", read_from = "lsb", reverse)]
struct Aligned {
    #[bondrewd(bit_length = 9)]
    number: u16,
    #[bondrewd(bit_length = 7, reserve)]
    reserve: u16,
}

impl From<Packed> for Aligned {
    fn from(value: Packed) -> Self {
        Self{
            // one: value.one,
            number: value.number,
            reserve: value.reserve
        }
    }
}

fn main() {
    // Packed
    assert_eq!(Packed::BIT_SIZE, 16);
    assert_eq!(Packed::BYTE_SIZE, 2);
    let ex = Packed {
        // one: 0,
        number: u16::MAX,
        reserve: 0,
    };

    let bytes = ex.clone().into_bytes();
    print_bytes(&bytes);

    // Aligned
    assert_eq!(Aligned::BIT_SIZE, 16);
    assert_eq!(Aligned::BYTE_SIZE, 2);
    let ex: Aligned = ex.into();

    let bytes = ex.into_bytes();
    print_bytes(&bytes);
}

fn print_bytes(bytes: &[u8]) {
    for b in bytes {
        print!("{b:08b}, ");
    }
    print!("\n");
}

the output of this is

11111111, 10000000,
11111111, 00000001,

I believe the second line, which is the print from the Aligned structure, does what you want? I could be wrong, but im going to wrap up my branch if this does what you need.

oddlama commented 4 months ago

the output of this is
11111111, 10000000,
11111111, 00000001,
I believe the second line, which is the print from the Aligned structure, does what you want? I could be wrong, but im going to wrap up my branch if this does what you need.

Thanks! I think his does mostly what I want, it is just a bit awkward to use because of the reverse. In embedded stuff there's usually a lot of flags and small enums to deal with, so this would cause them to be reversed too. Consider this example:

#[derive(Bitfields, Clone)]
#[bondrewd(default_endianness = "be", bit_traversal = "back", reverse)]
pub struct Fifo {
    #[bondrewd(bit_length = 12)]
    /// The fifo length
    pub length: u16,
    #[bondrewd(bit_length = 4, reserve)]
    #[allow(dead_code)]
    pub reserved: u8,

    #[bondrewd(bit_length = 3, reserve)]
    #[allow(dead_code)]
    pub reserved2: u8,
    /// Store temperature data in FIFO. Defaults to false.
    pub temperature_enable: bool,
    /// Store pressure data in FIFO. Defaults to false.
    pub pressure_enable: bool,
    /// Store sensortime frame after the last valid data frame. Defaults to false.
    pub time_enable: bool,
    /// Stop writing samples into FIFO when FIFO is full. Defaults to true after power-on-reset.
    pub stop_on_full: bool,
    /// Enables or disables the fifo. Defaults to false.
    pub enable: bool,
}

fn main() {
    let x = FifoLength {
        length: 0b1011_00111000,
        reserved: 0b0,
        reserved2: 0b0,
        temperature_enable: true,
        pressure_enable: false,
        time_enable: true,
        stop_on_full: true,
        enable: true,
    };
    let bytes = x.into_bytes();
    print_bytes(&bytes);
}

I would usually define my struct members in the order they appear in the bitstream, but due to back&reverse I will now have to swap the order of every byte, (so reverse order of reserved2 and all flags). I'm also not sure how you could mix a u16 little endian with a u16 big endian field (except for creating a second struct that also reverses maybe?).

The code above outputs:

^^^^^^^^------^^^^---------- length
          ^^^^-------------- reserved
                    ^------- enable
                     ^------ stop_on_full
                      ^----- time_enable
                       ^---- pressure_enable
                        ^--- temperature_enable
                         ^^^ reserved2

So the length is little endian with aligned padding, exactly as needed. But the configuration flags are now reversed in their order. I would love a way to get this instead, without having to define more structs:

00111000, 00001011, 00010111,
^^^^^^^^------^^^^---------- length
          ^^^^-------------- reserved
                    ^^^----- reserved2
                       ^---- temperature_enable
                        ^--- pressure_enable
                         ^-- time_enable
                          ^- stop_on_full
                           ^ enable

For flags this is of course easy to fix by just reordering, but I am not sure how I would deal with e.g. a 3-bit enum, where the bit order is usually big endian and would thus be reversed here.

Devlyn-Nelson commented 4 months ago

So I have been doing some open-heart surgery on the logic that determines endianness and want to say I believe I am on a good track for making this easier for every one. I am currently fixing up some bugs for aligned-little-endian.

I'm hoping to remove the need for anyone to use bit_traversal or reverse by:

Presenting 2 little-endians and the standard big-endian.
Documenting the heck out of my recent learnings for endianess in the internals of bondrewd.
Providing a hopefully easy to understand section in the README and docs.rs page about picking an endianness.

Fun Story

After looking around my code at work that uses bondrewd, I found and remembered a HAL for a radio that said it was little-endian, but at the time bondrewd didn't exist so i was using other bit-field crates and I couldn't get it too work. So I had to create Bondrewd and designed it to:

Replace the other bit field libraries in our existing code that required the current big and little-endian styles presented.
Make this new little-endian format work by exploiting big-endian with reversed bit and byte orders using math at compile time.

After Bondrewd was working enough for me to use bit and byte reversed big-endian for that project, I forgot about it... Until a couple days ago.

tip: if you use nested structures: DO NOT use reverse on the inner nested structure while using bit and byte reversed big-endian. I'm working on a fix for this issue.

Devlyn-Nelson commented 3 months ago

Thought i would put out an update.

Things were just about ready for release, and i hit a major snag because of how my enum logic worked when moving between parsing and solved states. But luckly my boss asked me about the possibility of a runtime version of bondrewd. i thought about it. realized i could fix the issue and do my job at the same time by creating bondrewd builder which would support both the derive and runtime versions. Then i injured my arm... So anyways i think i have a great plan moving forward, but until my arm recovers it will be slow.

oddlama commented 3 months ago

Thanks for the update! Take your time to recover, and I hope your plan works out smoothly