eclipse-iceoryx / iceoryx2

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust
https://iceoryx.io
Apache License 2.0
449 stars 22 forks source link

Explore Miri and Loom for hardening of lock-free code #121

Open elBoberido opened 4 months ago

elBoberido commented 4 months ago

Brief feature description

It's hard to get lock-free code right and one can easily introduce races. Miri and Loom can help to find those races

Detailed information

Miri and Loom can help to harden to lock-free structures. We need to check the limits on these tools and how to integrate them into the CI.

Here is a small example of a FIFO with a wrong memory ordering on line 98: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7e42e80e68c48de09c7615422eea3c00

hippiehunter commented 4 months ago

I tried building a proof of concept for using loom in the lock-free area. I ran into some issues and figured i might share what I learned here.

elBoberido commented 4 months ago

@hippiehunter thanks for looking into this. Do you have experience with loom and know it's limitations?

I've only some experience with Miri and it seems to work quite good in general but there are also cases where it either generates false positives or I do not understand the underlying problem.

hippiehunter commented 4 months ago

I don't have any practical experience with loom, it's been on my todo list for a while and this looked like a good place to start. My understanding of its limitations only comes from what I've read in their documentation/issues and random blogs.

I tried running miri on the same test and it seems to be failing for FixedSizeContainer<TestType, CAPACITY> in the same place that loom fails. You might have a better understanding of this error message.

error: Undefined Behavior: trying to retag from <wildcard> for SharedReadWrite permission at alloc35928[0x50], but no exposed tags have suitable permission in the borrow stack for this location
   --> /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:429:19
    |
429 |             &mut *(*self.data_ptr.as_ptr().offset(index as isize)).get()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                   |
    |                   trying to retag from <wildcard> for SharedReadWrite permission at alloc35928[0x50], but no exposed tags have suitable permission in the borrow stack for this location
    |                   this error occurs as part of retag at alloc35928[0x50..0x54]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
    = note: BACKTRACE:
    = note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::get_next_free_index` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:429:19: 429:67
    = note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::acquire_raw_index` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:358:29: 358:63
    = note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::acquire_with_additional_cleanup::<'_, {closure@iceoryx2_bb_lock_free::mpmc::container::Container<TestType>::add::{closure#0}}>` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:301:18: 301:42
    = note: inside `iceoryx2_bb_lock_free::mpmc::container::Container::<TestType>::add` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/container.rs:281:15: 281:75
    = note: inside `iceoryx2_bb_lock_free::mpmc::container::FixedSizeContainer::<TestType, 129>::add` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/container.rs:510:18: 510:43
note: inside `mpmc_container::mpmc_container_add_and_remove_elements_works::<TestType>`
   --> iceoryx2-bb/lock-free/tests/mpmc_container_tests.rs:91:25
    |
91  |             let index = sut.add((i * 3 + 1).into());

interestingly while trying to feel this error out, I tried switching to RelocatablePointer<UnsafeCell<AtomicU32>>instead of RelocatablePointer<UnsafeCell<u32>> for UniqueIndexSet.data_ptr. Then I changed the helpers in UniqueIndexSet like this

fn get_next_free_index(&self, index: u32) -> u32 {
        unsafe {
            (*(*self.data_ptr.as_ptr().offset(index as isize)).get()).load(Ordering::Relaxed)
        }
    }
    #[allow(clippy::mut_from_ref)]
    fn set_next_free_index(&self, index: u32, value: u32) {
        #[deny(clippy::mut_from_ref)]
        unsafe {
            (*(*self.data_ptr.as_ptr().offset(index as isize)).get()).store(value, Ordering::Relaxed);
        }
    }

As well as replacing the helper usages where appropriate. This made the tests pass under miri but I don't have any confidence in my ability to reason about Ordering::Relaxed so I fear I might be masking something. Anyway, sorry if I'm just generating noise.

elfenpiff commented 4 months ago

@hippiehunter Wow, that's awesome. I already have my full focus exactly on this lock-free thing. I have a fixed some parts of your bug-report on the branch: iox2-129-fix-missing-connections and wrote some tests that at least very reliable crash for now.

I will try to incorporate your suggestions and let's see if we can get this thing running.

elBoberido commented 4 months ago

Right, UniqueIndexSet has a memory order issue which I need to fix (#120)