fereidani / kanal

The fast sync and async channel that Rust deserves
MIT License
1.34k stars 33 forks source link

Oneshot: Data race detected #35

Open QuarticCat opened 1 year ago

QuarticCat commented 1 year ago

Problem

Continuing from #34, I audited oneshot and found a data racing problem. The quickest way to reproduce it is using MIRI.

$ cargo +nightly miri --version                                 
miri 0.1.0 (a6f8aa5 2023-08-11)

$ cargo +nightly miri test --test=oneshot_test -- send_win_panic
Preparing a sysroot for Miri (target: x86_64-unknown-linux-gnu)... done
    Finished test [optimized + debuginfo] target(s) in 0.02s
     Running tests/oneshot_test.rs (/home/qc/.cache/cargo-build/miri/x86_64-unknown-linux-gnu/debug/deps/oneshot_test-c2884fce5be3f64a)
warning: Miri does not support optimizations. If you have enabled optimizations by selecting a Cargo profile (such as --release) which changes other profile settings such as whether debug assertions and overflow checks are enabled, those settings are still applied.

running 1 test
test asyncs::send_win_panic ... error: Undefined Behavior: Data race detected between (1) Read on thread `asyncs::send_wi` and (2) Write on thread `<unnamed>` at alloc33459. (2) just happened here
   --> /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1
    |
497 | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Read on thread `asyncs::send_wi` and (2) Write on thread `<unnamed>` at alloc33459. (2) just happened here
    |
help: and (1) occurred earlier here
   --> /home/qc/Workspace/NotMe/kanal/src/signal.rs:230:15
    |
230 |         match &(*this).waker {
    |               ^^^^^^^^^^^^^^
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::drop_in_place::<kanal::OneshotSendFuture<std::boxed::Box<usize>>> - shim(Some(kanal::OneshotSendFuture<std::boxed::Box<usize>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
    = note: inside `std::ptr::drop_in_place::<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>> - shim(Some(std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
    = note: inside `std::ptr::drop_in_place::<std::pin::Pin<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>>> - shim(Some(std::pin::Pin<std::boxed::Box<kanal::OneshotSendFuture<std::boxed::Box<usize>>>>))` at /home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:497:1: 497:56
note: inside closure
   --> tests/oneshot_test.rs:298:13
    |
298 |             });
    |             ^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error; 1 warning emitted

error: test failed, to rerun pass `--test oneshot_test`

Caused by:
  process didn't exit successfully: `/home/qc/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner /home/qc/.cache/cargo-build/miri/x86_64-unknown-linux-gnu/debug/deps/oneshot_test-c2884fce5be3f64a send_win_panic` (exit status: 1)

How to trigger it

  1. The sender wins the race and goes waiting.
  2. The receiver loses the race and tries to finish it. Now it has already set the ptr to FINISHED but it hasn't actually finished receiving yet.
  3. The sender drops. It finds that the ptr has already been set to FINISHED, so it drops the OneshotInternal, which contains the waker.
  4. The receiver then finishes receiving and invokes wake. BANG!

Possible solution

You can add a state before FINISHED indicating that the data is transferring. An existing example is AtomicWaker, which has a REGISTERING state indicating that the waker is being replaced.

QuarticCat commented 1 year ago

This might also relate to #29.

fereidani commented 1 year ago

Tnx @QuarticCat I saw it a few months ago, I didn't get the time to fix it yet, it's the only bug that is stopping Kanal from the 0.1 release. I will remove one-shot channel if I don't find a sound solution for it in an acceptable time.

Qwuke commented 4 months ago

@fereidani Just wanted to bump this issue and also perhaps see if you had a concrete deadline in mind for the solution here. I'm using kanal in an upcoming project and it looks awesome, and the 0.1 release would be incredible.

fereidani commented 2 months ago

@Qwuke I'm going to remove oneshot from kanal and make it separate crate, I'm a bit busy with work now, oneshot needs excessive refactor and testing to be stabilize. I'll revisit as soon as I got free time.