Open elBoberido opened 4 months ago
I tried building a proof of concept for using loom in the lock-free area. I ran into some issues and figured i might share what I learned here.
thread::scope
doesn't exist in loom directly. There is a PR to the loom repo that adds it.loom::sync::barrier
exists only to allow code to compile it throws at runtime and using std::sync::barrier results in deadlocks for the current tests. These are more or less expected because loom is an intentionally unfair scheduler. I haven't spent enough time to figure out how to make the mpmc tests run properly without the barriers. FixedSizeContainer
for non primitive types segfaults during construction and I don't know enough to determine if its a loom problem, a FixedSizeContainer problem, or a problem with how i adapted the code to use loom::cell::UnsafeCell. UnsafeCell
The key change as I understand it revolves around not using * or .get() on the UnsafeCell but instead, refactoring to use with
and with_mut
depending on if you need read or write access to the contents. The required changes weren't bad in the lock-free area but I didn't do any analysis on what it would mean to the rest of the codebase.@hippiehunter thanks for looking into this. Do you have experience with loom and know it's limitations?
I've only some experience with Miri and it seems to work quite good in general but there are also cases where it either generates false positives or I do not understand the underlying problem.
I don't have any practical experience with loom, it's been on my todo list for a while and this looked like a good place to start. My understanding of its limitations only comes from what I've read in their documentation/issues and random blogs.
I tried running miri on the same test and it seems to be failing for FixedSizeContainer<TestType, CAPACITY> in the same place that loom fails. You might have a better understanding of this error message.
error: Undefined Behavior: trying to retag from <wildcard> for SharedReadWrite permission at alloc35928[0x50], but no exposed tags have suitable permission in the borrow stack for this location
--> /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:429:19
|
429 | &mut *(*self.data_ptr.as_ptr().offset(index as isize)).get()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| trying to retag from <wildcard> for SharedReadWrite permission at alloc35928[0x50], but no exposed tags have suitable permission in the borrow stack for this location
| this error occurs as part of retag at alloc35928[0x50..0x54]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
= note: BACKTRACE:
= note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::get_next_free_index` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:429:19: 429:67
= note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::acquire_raw_index` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:358:29: 358:63
= note: inside `iceoryx2_bb_lock_free::mpmc::unique_index_set::UniqueIndexSet::acquire_with_additional_cleanup::<'_, {closure@iceoryx2_bb_lock_free::mpmc::container::Container<TestType>::add::{closure#0}}>` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/unique_index_set.rs:301:18: 301:42
= note: inside `iceoryx2_bb_lock_free::mpmc::container::Container::<TestType>::add` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/container.rs:281:15: 281:75
= note: inside `iceoryx2_bb_lock_free::mpmc::container::FixedSizeContainer::<TestType, 129>::add` at /mnt/d/repos/iceoryx2/iceoryx2-bb/lock-free/src/mpmc/container.rs:510:18: 510:43
note: inside `mpmc_container::mpmc_container_add_and_remove_elements_works::<TestType>`
--> iceoryx2-bb/lock-free/tests/mpmc_container_tests.rs:91:25
|
91 | let index = sut.add((i * 3 + 1).into());
interestingly while trying to feel this error out, I tried switching to RelocatablePointer<UnsafeCell<AtomicU32>>
instead of RelocatablePointer<UnsafeCell<u32>>
for UniqueIndexSet.data_ptr. Then I changed the helpers in UniqueIndexSet like this
fn get_next_free_index(&self, index: u32) -> u32 {
unsafe {
(*(*self.data_ptr.as_ptr().offset(index as isize)).get()).load(Ordering::Relaxed)
}
}
#[allow(clippy::mut_from_ref)]
fn set_next_free_index(&self, index: u32, value: u32) {
#[deny(clippy::mut_from_ref)]
unsafe {
(*(*self.data_ptr.as_ptr().offset(index as isize)).get()).store(value, Ordering::Relaxed);
}
}
As well as replacing the helper usages where appropriate. This made the tests pass under miri but I don't have any confidence in my ability to reason about Ordering::Relaxed so I fear I might be masking something. Anyway, sorry if I'm just generating noise.
@hippiehunter Wow, that's awesome. I already have my full focus exactly on this lock-free thing. I have a fixed some parts of your bug-report on the branch: iox2-129-fix-missing-connections
and wrote some tests that at least very reliable crash for now.
I will try to incorporate your suggestions and let's see if we can get this thing running.
Right, UniqueIndexSet
has a memory order issue which I need to fix (#120)
Brief feature description
It's hard to get lock-free code right and one can easily introduce races. Miri and Loom can help to find those races
Detailed information
Miri and Loom can help to harden to lock-free structures. We need to check the limits on these tools and how to integrate them into the CI.
Here is a small example of a FIFO with a wrong memory ordering on line 98: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7e42e80e68c48de09c7615422eea3c00