Cryptjar / avr-progmem-rs

Progmem utility for the AVR architecture
Apache License 2.0
28 stars 8 forks source link

Incomplete safety invariants for `read_byte_loop_raw` #12

Closed workingjubilee closed 1 year ago

workingjubilee commented 1 year ago

Here, the code loops over the bytes that a type may occupy and reads each: https://github.com/Cryptjar/avr-progmem-rs/blob/8d00b3507e623b6ff1a3853cf57b24aa9852be10/src/raw.rs#L170-L177

As you note, this is morally equivalent to ptr::copy, but implemented via bytewise reading. Unfortunately, the results are not well-defined for all T. A type may have uninitialized bytes or initialization invariants, such as padding or niches.

#[derive(Clone, Copy, Default)]
struct HasPadding {
    a: u16,
    b: u8,
}
assert_eq!(4, mem::size_of::<HasPadding>()); // A padding byte exists

If it is valid to read a padding byte, this will incite no complaints, but we are not so lucky:

for i in 0..mem::size_of::<HasPadding>() {
    unsafe {
        let padded = padded.cast::<u8>().add(i);
        let blender = blender.cast::<u8>().add(i);
        // error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
        blender.write(padded.read())
    };
}

I put together a few examples of ways to provoke Miri's anger on the Playground while studying this. In particular, you can observe how making sure there is no padding byte by so much as adding a field in the right place quiets Miri's complaints. Unfortunately, Miri is not fully aware of AVR's memory semantics, so has no useful commentary there.

Obviously, as this function is unsafe, no change in the code is required, merely documenting this additional invariant. However, this has implications on the function's usage and how it is used in this crate and others, so you may want a solution. It may be correct to simply copy all the bytes in a type before any individual byte enters Rust (i.e. to copy each T fully while in a single asm! block). This is in the domain of "underdefined semantics that may be subject to change", but is unlikely to, as it is a consequence of Rust only understanding assembly in a "black box" manner, seeing only its defined interface and having no vocabulary for what goes on inside asm!: "Whereof one cannot speak, thereof one must be silent".

Cryptjar commented 1 year ago

Hmm, it seems you raise a valid point, unfortunately. Nonetheless, thanks for spotting this! If I understand that correctly, there are two ways to fix this:

While I increasingly think the bytemuck constrain would be quite reasonable in the long term (e.g. for v0.4.x), just switching to read_asm_loop_raw should be backwards compatible and thus seems preferable for a quick fix in the short term.

What you think?

workingjubilee commented 1 year ago

Yeah, it sounds like you have a good path forward there: I do recommend the read_asm_loop_raw switch and then considering the NoUninit constraint at your leisure.

workingjubilee commented 1 year ago

Nice. The fix appears to resolve a real miscompilation in this repro crate: https://github.com/gergoerdi/llvm-issue-107293

Cryptjar commented 1 year ago

Released as v0.3.3, I also back ported the fix as v0.2.2 and v0.1.6, respectively.

Thanks again for pointing this out.