Fix data race caught by miri

jamwaffles commented 3 months ago

Well that's scary... I reproduced locally twice, but now can't, and CI can't either. Bizarre.

jamwaffles commented 3 months ago

The one failing output I captured posted here for reference

test pdu_loop::tests::multiple_threads ... error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `unnamed-54` and (2) non-atomic read on thread `unnamed-53` at alloc1352+0x30. (2) just happened here
   --> /home/james/.rustup/toolchains/nightly-2024-07-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:711:15
    |
711 |         match *self {
    |               ^^^^^ Data race detected between (1) non-atomic write on thread `unnamed-54` and (2) non-atomic read on thread `unnamed-53` at alloc1352+0x30. (2) just happened here
    |
help: and (1) occurred earlier here
   --> src/pdu_loop/frame_element/mod.rs:208:13
    |
208 |             *first_pdu = Some(AtomicU8::new(value));
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE (of the first span) on thread `unnamed-53`:
    = note: inside `std::option::Option::<std::sync::atomic::AtomicU8>::as_ref` at /home/james/.rustup/toolchains/nightly-2024-07-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:711:15: 711:20
note: inside `pdu_loop::frame_element::FrameElement::<0>::first_pdu`
   --> src/pdu_loop/frame_element/mod.rs:200:9
    |
200 | /         (*addr_of!((*this.as_ptr()).first_pdu))
201 | |             .as_ref()
    | |_____________________^
note: inside `pdu_loop::storage::PduStorageRef::<'_>::frame_index_by_first_pdu_index`
   --> src/pdu_loop/storage.rs:249:41
    |
249 |             if let Some(pdu) = unsafe { FrameElement::<0>::first_pdu(frame) } {
    |                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `pdu_loop::pdu_rx::PduRx::<'_>::receive_frame`
   --> src/pdu_loop/pdu_rx.rs:82:27
    |
82  |           let frame_index = self
    |  ___________________________^
83  | |             .storage
84  | |             .frame_index_by_first_pdu_index(pdu_idx)
    | |____________________________________________________^
note: inside closure
   --> src/pdu_loop/mod.rs:547:27
    |
547 |                     while rx.receive_frame(&ethernet_frame).is_err() {}
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 3 warnings emitted

error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/home/james/.rustup/toolchains/nightly-2024-07-13-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/james/Repositories/ethercrab/target/miri/x86_64-unknown-linux-gnu/debug/deps/ethercrab-2da4755d62c2e920` (exit status: 1)
note: test exited abnormally; to see the full output pass --nocapture to the harness.

jamwaffles commented 3 months ago

Failure is reproducible with:

MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly-2024-07-13 miri test --features '__internals' --target x86_64-unknown-linux-gnu

Interestingly, filtering with pdu_loop or multiple_threads causes the previously failing multiple_threads test to pass.

jamwaffles commented 3 months ago

Error has been fixed locally (i.e. the failing miri test now passes with the specific command above), however miri nextest run couldn't reproduce it locally, and it wouldn't reproduce in CI with OR without nextest, so it's difficult to confirm. Not great :(

ethercrab-rs / ethercrab

Fix data race caught by miri #223