gauteh / defmt-serial

Log defmt messages over the serial port.
24 stars 7 forks source link

`defmt_serial` function appears to be unsound #9

Closed jamesmunns closed 7 months ago

jamesmunns commented 8 months ago

In this code: https://github.com/gauteh/defmt-serial/blob/main/src/lib.rs#L88-L107

You appear to:

The destructor is not called, but the reference is pointing to the temporary stack location.

It's possible I'm misunderstanding this, but it appears at first to be unsound.

The closure or serial port instance will need to actually stored at static duration. The + 'static bound just means that it is OWNED, not that it actually lives for the 'static lifetime.

gauteh commented 8 months ago

Hi James,

thanks for looking into this. I think you are right, it probably still works since defmt_serial is typically called at the beginning of main and thus still lives on the stack. But the closure is not really stored anywhere where it should live.. maybe it lives in the F of get_trampoline, anyway a bit fragile. Not sure how ManuallyDroping anything on the stack actually works... the memory is probably still cleaned up, it's just that drop is not called.

The + 'static was (probably) added so that serial would have to be passed by value and could be stored in the closure.

In #10 I experiment with passing wfn by ownership to trampoline, but I don't think it makes any difference.

jamesmunns commented 8 months ago

@gauteh I don't think the trampoline stuff works the way you think it might.

if let Some(wfn) = WRITEFN {
    let wfn = &mut *(wfn as *mut F);
    wfn(buf);
}

At this point, WRITEFN/wfn contains a pointer to a monomorphized function, trampoline(SFn)

You are essentially transmuting that function from an fn(SFn) into the the type of the closure (not nameable, but here: F), and then calling the closure as if it was a function, but without storing or passing in the captured environment (which stores the serial port!).

I'm actually not able to fully explain how this sometimes works, a closure typically implicitly takes its captured environment, but you never provide it.

My only thought is that if the serial port type that is captured is actually a ZST, the captured environment becomes "nothing", which means the closure acts like the function pointer you transmute it to? But that's just a guess.

I don't think passing by ownership will help - there's no way to store the generic as a static because you don't know the type of serial port that is being stored.

I don't think this is sound at all. I think if you were to peel away the unsafe functions and blocks, and work through it step by step, you'd find a place where you are performing an invalid transmute.

If you would like to implement this soundly, I think you would need to have the caller do the leaking and erasing of the type, asking them to pass in something like an &'static mut dyn embedded_hal::blocking::serial::Write<u8>.

jamesmunns commented 8 months ago

it probably still works since defmt_serial is typically called at the beginning of main and thus still lives on the stack.

Also, wrt this: as soon as the defmt_serial function returns, the stack frame is invalid, which means if there is ever any use of stack after the call to that function (e.g. calling any other function in main after defmt_serial), the data can be corrupted.

gauteh commented 8 months ago

My only thought is that if the serial port type that is captured is actually a ZST, the captured environment becomes "nothing", which means the closure acts like the function pointer you transmute it to? But that's just a guess.

I don't think that is the case, it has worked on multiple serial-implementations. But I also don't understand why it works. Would be interesting to look at the generated code and try to untangle that at some point... it has worked so consistently that it seems unlikely to be just luck (although there's a large element of it).

If you would like to implement this soundly, I think you would need to have the caller do the leaking and erasing of the type, asking them to pass in something like an &'static mut dyn embedded_hal::blocking::serial::Write<u8>.

Yes. I tried to do this at some point by rather providing a defmt_serial!-macro. That's probably the most ergonomic way of doing it, perhaps with a function scoped static variable (it should still live forever..).

gauteh commented 8 months ago

I am trying another approach in #10 now, wrote a trait that erases the error type and puts the responsibility of keeping the serial alive on the user. It may be too restrictive now, requiring the serial to be &'static, but that may be necessary.

Now needs to be done like this: https://github.com/gauteh/defmt-serial/pull/10/files#diff-8d87d18b2263b4e9a02952577edb558f200563e289acf0b6d4c68ab336d3553f.

Can consider doing this in a macro to help the user put it in a global static.

jamesmunns commented 8 months ago

help the user put it in a global static.

I'd like to suggest you point people towards using a sound abstraction like StaticCell rather than asking them to use static mut, which is very easy to introduce undefined behavior with.

Additionally, I believe you could use critical_section crate to avoid static mut usage of your own.

It may be too restrictive now, requiring the serial to be &'static, but that may be necessary.

I don't think this is too restrictive - IMO it is exactly the right amount of restrictive. Anything you place in a static should have 'static storage duration to be sound.

gauteh commented 8 months ago

help the user put it in a global static.

I'd like to suggest you point people towards using a sound abstraction like StaticCell rather than asking them to use static mut, which is very easy to introduce undefined behavior with.

Yes, I was looking at UnsafeCell and the likes. But StaticCell is probably good. I would ideally like to prevent the user from using the serial port afterwards (so that the defmt-stream won't be messed up).

Additionally, I believe you could use critical_section crate to avoid static mut usage of your own.

use core::cell::Cell;
use critical_section::Mutex;

static MY_VALUE: Mutex<Cell<u32>> = Mutex::new(Cell::new(0));

critical_section::with(|cs| {
    // This code runs within a critical section.

    // `cs` is a token that you can use to "prove" that to some API,
    // for example to a `Mutex`:
    MY_VALUE.borrow(cs).set(42);
});

It may be too restrictive now, requiring the serial to be &'static, but that may be necessary.

I don't think this is too restrictive - IMO it is exactly the right amount of restrictive. Anything you place in a static should have 'static storage duration to be sound.

Ok, yeah. Sounds right.

gauteh commented 8 months ago

Unsure if I can replace with Mutex because defmt::acquire also locks, in practice this is probably re-entrant (disabling interrupts), but not necessarily.

gauteh commented 8 months ago

Now doing something different:

this prevents the user from using the serial port when it has been assigned to defmt_serial. It also means that the serial doesn't need to live forever.