embassy-rs / embassy

Modern embedded framework, using Rust and async.
https://embassy.dev
Apache License 2.0
5.15k stars 713 forks source link

Async read methods on nrf twim lead to unsoundness when canceled (?) #2924

Open ftilde opened 3 months ago

ftilde commented 3 months ago

I've been debugging lockup issues with twim and async cancelation in my project (investigation still ongoing), started reading the embassy_nrf::twim code to understand my problem and came across the following potential issue.

Consider the following snippet:

let mut twim: embassy_nrf::twim::Twim<'_,_> = build_twim(...);
let mut buf = [0u8; 4];
let _ = embassy_futures::poll_once(async {
     twim.read(27, &mut buf).await .unwrap();
});
do_something_with(buf);

As far as I understood, twim.read sets up the read operation by writing the slice pointer and length into the respective registers and the twim hardware then uses these to write the obtained bytes into the buffer. Assuming the transfer is not yet done when the async_wait future is called (once), the future in then dropped and (crucially) the read operation is not aborted, continuing to write to buf in the above example. This, however, violates the exclusive access assumption of the mutable reference to buf in the following code and may lead to data races, for example.

Do you agree?

If so, what do you think would be the best way to fix this? I've seen that in the qspi-code an OnDrop-guard is used to wait for the operation to finish in case of cancelation. Should we implement this here as well? If you agree I could prepare a PR.

PS: From a cursory glance I suspect the same issue might be present for spim as well.

Dirbaio commented 3 months ago

True, this is a problem.

The fix is to stop DMA when the future is canceled early (ie dropped). The uart driver does do it: https://github.com/embassy-rs/embassy/blob/ab85eb4b60cd49ebcd43d2305f42327685f5e5a6/embassy-nrf/src/uarte.rs#L642

ftilde commented 3 months ago

Alright, I'll have a look some time in the next days. I think a similar problem exists with write methods. Although I don't see a problem with mutable aliasing, the input slice might be deallocated in the meantime and bogus data would be sent by the twim device. So I think writes should be cancelled accordingly.