Robbepop / modular-bitfield

Macro to generate bitfields for structs that allow for modular use of enums.
Apache License 2.0
159 stars 41 forks source link

Allow non-power-of-two enums #25

Closed lkolbly closed 3 years ago

lkolbly commented 3 years ago

Fixes #3.

This supports non power of two enums more or less how they're specified in 06-enums. The return value is an Result<T, EnumError> where EnumError has a single variant UnrecognizedValue. Enums are limited to 32 bits just to make the error type easier.

This PR also splits the Face type on the Specifier trait into a GetterReturn type which specifies the type returned by a getter. For most types it's equivalent to the Face type.

I screwed up some of the commits in a merge, but squashed they're good - I can squash them or tease out the Face/GetterReturn split changes into a separate commit. (hence why I haven't autosquashed yet)

There is no way to set a raw value. I was imagining an API like:

my_struct.set_small_prime_raw(9); // Maybe even unsafe?
let my_struct = my_struct.with_small_prime_raw(9);

but I think that would require _raw methods for every field, not just enum fields. Which is fine it's just more methods/compile time.

Robbepop commented 3 years ago

Wow, this is really cool! Thanks a lot for this. :rocket:

I quickly glimpsed over the diff and all in all it looks good, however, I am going to need to review it properly later and have to wrap my head around the changes. It would be especially nice, if you could add some more tests to cover for the new functionality in particular.

Robbepop commented 3 years ago

Note that with this PR the current codegen for our new constructor for the generated bitfield structs might invoke undefined behaviour since it is no longer guaranteed that the zero bit pattern is actually supported by all bitfields as described here: https://github.com/Robbepop/modular-bitfield/issues/27#issue-724066122

Before merging this PR we have to find a solution to this issue.

lkolbly commented 3 years ago

might invoke undefined behaviour

Is it undefined? It's certainly not good behaviour, but as far as I can tell it's fairly well defined (in particular, trying to get an enum field from a ::new()'d struct will return an error, which is weird.)

Robbepop commented 3 years ago

might invoke undefined behaviour

Is it undefined? It's certainly not good behaviour, but as far as I can tell it's fairly well defined (in particular, trying to get an enum field from a ::new()'d struct will return an error, which is weird.)

Yes you are actually right about this since we handle the erroneous case in our getters. I wonder though if we actually do need this error handling at all if we on the other handside could make sure that those states are never representable in the first place (without serious usage of unsafe).

This requires change to constructors of bitfields but our setters already only take valid values, e.g. you input an enum that is converted to a valid bit pattern and then you retrieve exactly this valid bit pattern back. The only thing that has to be correctly implemented is this conversion. So if we can eliminate all safe APIs that might introduce bad bit patterns we won't even have a need to check for this in our getters.

lkolbly commented 3 years ago

I wonder though if we actually do need this error handling at all if we on the other handside could make sure that those states are never representable in the first place

In my mind, there's two things a user might want to do: Converting from convenient Rust data structures into packed bitfields, and reading from packed bitfields into convenient Rust data structures. For anyone interacting with hardware (e.g. reading registers), they necessarily are receiving data from "untrusted" sources, or at least sources that we can't control. Something like:

loop {
    let reg = read_reg(0x1234);
    let reg = unsafe { MyReg::from_bytes_unchecked(reg as bytes) };
    match reg.state()? {
        MyRegState::Running => { break; },
        MyRegState::Error => { return Err("State machine encountered error starting!"); },
        MyRegState::Starting => { /* noop */ },
    }
}

from_bytes_unchecked is already unsafe, but if we make it so that get()ing an unrecognized enum variant panics, then I need to separately check whether the incoming value is valid:

     let reg = read_reg(0x1234);
+    if (reg >> 5) & 0x3 == 0x3 {
+        return Err("Received invalid field!");
+    }
     let reg = unsafe { MyReg::from_bytes_unchecked(reg as bytes) };

An alternative might be that the constructor does the check and returns an error, but that introduces two problems - One is that, in the above code, constructing reg might fail for an enum that I don't even care about (here I only read the state() field, and there could be many fields). The other is that I may still care about the raw value, probably so I can report it to the developer ("got unrecognized register status 0x3!").

So I think we definitely want to be able to get into a situation where the bitfield object contains an "invalid" state.

One downside of returning errors from getters is that sloppy users will do things like this:

     match reg.state() {
         Ok(MyRegState::Running) => { break; },
         Ok(MyRegState::Error) => { return Err("State machine encountered error starting!"); },
         Ok(MyRegState::Starting) => { /* noop */ },
+        Err(modular_bitfield::EnumError(UnrecognizedValue(3))) => { /* I'm too lazy to update the struct itself but I need to handle this case */ }
         _ => { /* Now is never hit... until someone updates the struct! */ }
     }

which I think is an argument for getters not returning errors even if #[returns_result] is specified with 2**bits number of variants. My argument against it was just it's inconvenient if they have to refactor lots of code, but maybe it's worth it to force the user to audit their matches. I almost wish we had compile_warning!, then we could do the "this will compile fine so you can run your code, but if you want to submit it to your CI you should really fix this" thing.

Robbepop commented 3 years ago

I wonder though if we actually do need this error handling at all if we on the other handside could make sure that those states are never representable in the first place

In my mind, there's two things a user might want to do: Converting from convenient Rust data structures into packed bitfields, and reading from packed bitfields into convenient Rust data structures. For anyone interacting with hardware (e.g. reading registers), they necessarily are receiving data from "untrusted" sources, or at least sources that we can't control. Something like:

loop {
    let reg = read_reg(0x1234);
    let reg = unsafe { MyReg::from_bytes_unchecked(reg as bytes) };
    match reg.state()? {
        MyRegState::Running => { break; },
        MyRegState::Error => { return Err("State machine encountered error starting!"); },
        MyRegState::Starting => { /* noop */ },
    }
}

from_bytes_unchecked is already unsafe, but if we make it so that get()ing an unrecognized enum variant panics, then I need to separately check whether the incoming value is valid:

     let reg = read_reg(0x1234);
+    if (reg >> 5) & 0x3 == 0x3 {
+        return Err("Received invalid field!");
+    }
     let reg = unsafe { MyReg::from_bytes_unchecked(reg as bytes) };

An alternative might be that the constructor does the check and returns an error, but that introduces two problems - One is that, in the above code, constructing reg might fail for an enum that I don't even care about (here I only read the state() field, and there could be many fields). The other is that I may still care about the raw value, probably so I can report it to the developer ("got unrecognized register status 0x3!").

So I think we definitely want to be able to get into a situation where the bitfield object contains an "invalid" state.

One downside of returning errors from getters is that sloppy users will do things like this:

     match reg.state() {
         Ok(MyRegState::Running) => { break; },
         Ok(MyRegState::Error) => { return Err("State machine encountered error starting!"); },
         Ok(MyRegState::Starting) => { /* noop */ },
+        Err(modular_bitfield::EnumError(UnrecognizedValue(3))) => { /* I'm too lazy to update the struct itself but I need to handle this case */ }
         _ => { /* Now is never hit... until someone updates the struct! */ }
     }

which I think is an argument for getters not returning errors even if #[returns_result] is specified with 2**bits number of variants. My argument against it was just it's inconvenient if they have to refactor lots of code, but maybe it's worth it to force the user to audit their matches. I almost wish we had compile_warning!, then we could do the "this will compile fine so you can run your code, but if you want to submit it to your CI you should really fix this" thing.

Thanks a lot for your thourough write-up! That really helped me to understand users of this library a lot better. :)

I am normally coming from a point of view where I think it is good practice to make it impossible to represent invalid state for an object. Therefore I am naturally leaning towards the checking constructor solution you described above since for me types are the purpose of encapsulating valid state.

However, I think you got a decent point there and also generating code for such complicated constructors might actually be pretty involved. Instead we should give up this thought of purity and accept that reality is impure and so should be the state of out bitfields which then implies getters returning Result in those cases. However, I am still thinking that we should avoid returning Result when we can even though this means that we sometimes might enforce some refactorings to the user, however, those are probably mostly wished for; I see it similarly as with the type system. Changing the signature of a public function also often incurs lots of refactoring steps but we are still not willing to give up our type systems and everything they bring to the table.

lkolbly commented 3 years ago

Well, I can only help you understand one user of this library :P

But sounds good then, thanks! To concretize the behaviour, I'm thinking #[bits = N] and #[can_error] are independent:

[1] Thinking about it more, I think the gotcha above with the sloppy users adding fields can happen even if they go from e.g. 6 variants to 7, so I'm not sure that throwing an error in the second case above will actually catch as much as I would initially hope. But it will catch things sometimes, so I could be convinced either way (to your point about the safety of the type system being desirable if sometimes inconvenient).

Robbepop commented 3 years ago

[1] Thinking about it more, I think the gotcha above with the sloppy users adding fields can happen even if they go from e.g. 6 variants to 7, so I'm not sure that throwing an error in the second case above will actually catch as much as I would initially hope. But it will catch things sometimes, so I could be convinced either way (to your point about the safety of the type system being desirable if sometimes inconvenient).

Design decisions are always the hardest part about programming in my opinion. ^^

  • If neither is present, it behaves as it did before.

Can we concretize what this means? What happened before is that we errored if the enum did not have a number of variants that is a power of two. Right? But that's what you have written in the last point. However, I think your last point might be a simple mistake. Because if #[bits = N] is present we very well accept non-power-of-two number of variants but will ensure that they won't require more than N bits in order to represent all states: log(#variants) <= N.

* If both are present, getters return `Result`, and if the number of variants = `2**bits` it throws an error (I'll experiment with warnings) [1]

Sounds good but please keep the experiment with warnings out of this PR. We don't want to blow up this big boy. Instead we can do a follow-up experimenting with this crate: https://crates.io/crates/proc-macro-error

* If `bits` is present but `can_error` is not, we simply assert that the number of variants is `2**bits`

We should actually assert what I have written above: That the enum does not require more than N bits to represent all of its information. So using this new #[bits = N] allows us to use non-power-of-two variants.

* `can_error` being present without `bits` is an error.

:+1:

Also I would not call it can_error because that is very technical of what happens under the hood with getters but instead something like #[incomplete] since that is what happens on the user facing side. The enum state is not completely covered.


To the points above we have to make a final decision about what does #[bits = N] on an enum. Does it allow for non-power-of-two number of variants telling which bounds N we'd like instead or does it the opposite and only restrict the user from using exactly as many variants as 2**N? As can be seen above I am leaning towards the first design.

This is also easy to teach users: If we discover a non-power-of-two enum missing the #[bits = N] attribute we can tell users to add it in case they need a non-power-of-two number of variants and we can even infer some valid minimal N for them.

lkolbly commented 3 years ago

To the points above we have to make a final decision about what does #[bits = N] on an enum. Does it allow for non-power-of-two number of variants telling which bounds N we'd like instead or does it the opposite and only restrict the user from using exactly as many variants as 2**N?

I think a #[bits = N] without a corresponding #[can_error] tag on a non-power-of-two enum has to be an error. (a #[bits = N] on a power-of-two enum with 2**bits variants is perfectly fine)

Alternatively, it could panic if you get() an invalid variant. In this case, can_error is more descriptive than incomplete, since the user is now explicitly "giving permission" for the enum's getters to return errors.

Robbepop commented 3 years ago

Okay let's simply go through some examples using your proposed design then:

#[derive(BitfieldSpecifier)]
#[bits = 1]
pub enum Place {
    First,
    Second,
}

This is fine since the enum requires 1 bit as specified in the attribute.

#[derive(BitfieldSpecifier)]
#[bits = 1]
pub enum Place {
    First,
    Second,
    Third,
}

This is an error since we require more bits (2) than specified (1).

#[derive(BitfieldSpecifier)]
#[bits = 2]
pub enum Place {
    First,
    Second,
    Third,
}

This still is an error because we did not specify #[may_error].

#[derive(BitfieldSpecifier)]
#[bits = 2]
#[may_error]
pub enum Place {
    First,
    Second,
    Third,
}

This is fine again since we now specified #[may_error].

#[derive(BitfieldSpecifier)]
#[bits = 2]
#[may_error]
pub enum Place {
    First,
    Second,
    Third,
    Fourth,
}

This is an error since we specified bits and may_error but we actually have a power-of-two size bitfield.

lkolbly commented 3 years ago

That all looks reasonable to me, the one I'm not sure about is the last one (if the user explicitly said may_error, maybe it's fine that it may never error? Not erroring I think would be the least surprising to the user, since bits and may_error are essentially orthogonal concerns).

I think in the second-to-last one, you have a typo, and it should be #[bits = 2]

Robbepop commented 3 years ago

I think in the second-to-last one, you have a typo, and it should be #[bits = 2]

Ah yes, i fixed it. Thanks!

That all looks reasonable to me, the one I'm not sure about is the last one (if the user explicitly said may_error, maybe it's fine that it may never error? Not erroring I think would be the least surprising to the user, since bits and may_error are essentially orthogonal concerns).

Hmm, ... I don't know what is the best approach myself. When I design things I generally try to be extremely strict initially because it is always possible to remove restrictions but it is super hard to impose new ones.

I actually wonder how @dtolnay would have designed this feature since the original design of this crate was made by him and I think he is a great designer.

lkolbly commented 3 years ago

Yeah. So, if we have a may_error attribute, it might surprise the user that we won't let them keep that attribute (and the API) when number of variants == 2**bits

It's probably easier to throw an error now, if people complain later it's easier to open it up than close it down.

On Mon, Oct 19, 2020 at 4:03 PM Hero Bird notifications@github.com wrote:

That all looks reasonable to me, the one I'm not sure about is the last one (if the user explicitly said may_error, maybe it's fine that it may never error? Not erroring I think would be the least surprising to the user, since bits and may_error are essentially orthogonal concerns).

I cannot follow. Can you elaborate please?

I think in the second-to-last one, you have a typo, and it should be #[bits = 2]

Ah yes, i fixed it. Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Robbepop/modular-bitfield/pull/25#issuecomment-712440455, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGWX4FT6O7H4KPVTU73FQTSLSSRPANCNFSM4SVHP5WQ .

Robbepop commented 3 years ago

Yeah. So, if we have a may_error attribute, it might surprise the user that we won't let them keep that attribute (and the API) when number of variants == 2bits It's probably easier to throw an error now, if people complain later it's easier to open it up than close it down. On Mon, Oct 19, 2020 at 4:03 PM Hero Bird **@.***> wrote: That all looks reasonable to me, the one I'm not sure about is the last one (if the user explicitly said may_error, maybe it's fine that it may never error? Not erroring I think would be the least surprising to the user, since bits and may_error are essentially orthogonal concerns). I cannot follow. Can you elaborate please? I think in the second-to-last one, you have a typo, and it should be #[bits = 2] Ah yes, i fixed it. Thanks! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGWX4FT6O7H4KPVTU73FQTSLSSRPANCNFSM4SVHP5WQ .

Well okay let's do it like that for now. If the design turns out to be suboptimal we can change it later easily since we are not yet version ^1.0.

Robbepop commented 3 years ago

In https://github.com/Robbepop/modular-bitfield/pull/28 I went with a simpler design for our previously discussed issue with erroneous getters or fail-free getters. In this new design we simply shift the control over to the user of the bitfield and have mirroring getters for expected failure and unexpected failure paths just as we already have for setters.

If you rebase your PR on top of this new branch the whole implementation should be a lot more straight forward for you.

lkolbly commented 3 years ago

Alright, I think that makes sense. So basically we wouldn't need the may_error attribute, and we can leave it up to the user whether they call get_checked or get (for panicking behaviour)?

I like that.

Robbepop commented 3 years ago

Alright, I think that makes sense. So basically we wouldn't need the may_error attribute, and we can leave it up to the user whether they call get_checked or get (for panicking behaviour)?

I like that.

Yep it now works exactly as you have described. :+1:

lkolbly commented 3 years ago

Alright I rebased (the change is substantially shorter now :P), the one clippy error seems to be pre-existing.

Robbepop commented 3 years ago

Alright I rebased (the change is substantially shorter now :P), the one clippy error seems to be pre-existing.

Yep you probably have to rebase again because I just fixed the clippy issue. :sweat_smile: Sorry!

lkolbly commented 3 years ago

Fortunately, no merge conflicts, so it wasn't too hard.

lkolbly commented 3 years ago

Anytime, thanks for maintaining the crate!