Global and local pinning

ghost commented 6 years ago

Introduce the concept of global pinning, which allows one to pin the current thread without going through a thread-local handle.

In addition to that, we'll slightly tweak the general interface of crossbeam-epoch in order to fix several reported problems with it.

Rendered

I'm feeling torn between all the possible alternatives and don't expect the RFC to land in the original form. It's hard to choose the right solution here, so help me out. :)

Related issues:

@Amanieu This RFC is the answer (a belated one, sorry!) to your questions about anonymous pinning.

Amanieu commented 6 years ago

I'm pretty happy with this RFC. My use case is a bit complicated because it involves threads that can potentially be killed at any point. I've included a rough code example which shows how my code works:

// There is one of these for each thread. They are tracked in a global list of
// all threads. ThreadData must be Send since it may be dropped in a different
// thread when a thread is killed.
struct ThreadData {
    handle: Handle,
    guard: Option<Guard>,
}

// This what the main loop of a thread looks like. The key fact to remember here
// is that threads can be asynchronously killed at any point, except when the
// ThreadData is locked.
fn run_thread(thread_data: &Mutex<ThreadData>) {
    let guard;

    {
        // Lock the thread. This prevents our thread from getting killed.
        let lock = thread_data.lock();

        // Pin the thread by setting thread_data.guard to Some(Guard).
        let guard = lock.guard.get_or_insert_with(|| lock.handle.pin());

        // We cheat a bit here by re-binding the lifetime of the guard so that
        // it outlives the lock. This is safe because we know that one of the
        // following occurs:
        // - Our thread is not killed, and we don't touch thread_data.guard
        //   anywhere except at the end of this function.
        // - If our thread is killed then our killer will drop our ThreadData
        //   (which it must lock before killing us). This will unpin the thread
        //   and avoid any memory leaks.
        guard = unsafe { &*(guard as *mut Guard) };

        // The thread stays pinned even after we release the lock.
        drop(lock);
    }

    // The main body of the thread goes here. We can safely access lock-free
    // data structures using our Guard, while still allowing our thread to be
    // killed at any point.

    // Unpin the guard after we are done using it.
    thread_data.lock().guard = None;
}

Amanieu commented 6 years ago

I don't care much for global pinning for my use case. There is one piece of code where I do need it but I've just used a Mutex<Handle> as a workaround (the mutex is needed for other reasons anyways, so performance is not an issue).

Amanieu commented 6 years ago

I would add another alternative: only allow a single Guard to exist per Handle at any point. This allows Handle to be safely sent to another thread since you can't get another Guard out of it while the previous Guard is still alive.

This would be ideal for my use case since I only have a single Guard in use at any one point, however I understand that it may be too limiting as a general-purpose solution.

ghost commented 6 years ago

I would add another alternative: only allow a single Guard to exist per Handle at any point. This allows Handle to be safely sent to another thread since you can't get another Guard out of it while the previous Guard is still alive.

If we add a lifetime to Guard, then we could do something like this:

struct Handle { ... }

struct Guard<'a> { ...  }

impl Handle {
    fn pin(&self) -> Guard<'_> { ... }

    fn pin_mut(&mut self) -> Guard<'_> { ... }
}

This way you can obtain a unique guard if you want (pin_mut), but don't have to (pin). Is this what you had in mind?

Amanieu commented 6 years ago

@stjepang That's doesn't work for me since I need to keep both the Guard and Handle in thread-local storage, which means that both must be 'static.

ghost commented 6 years ago

@Amanieu I see.

So in that case would you prefer something similar to raw_lock/raw_unlock from parking_lot, except in our case the methods would be named Handle::raw_pin and Handle::raw_unpin (the latter one is unsafe).

What kind of interface would suit you best?

Amanieu commented 6 years ago

Here is the interface that I have in mind:

impl Handle {
    // This fails if a Guard already exists for this Handle
    fn pin(&self) -> Option<Guard>;
}

Guard will no longer be Clone, which together with the above change, ensures that there is at most one Guard per Handle at any time. This will allow us to make both Handle: Send and Guard: Send.

Amanieu commented 6 years ago

Only allowing one guard at a time might be too restrictive for the global collector since it doesn't allow nesting, so we can also provide this API only for the global collector (using the hidden per-thread handle):

thread_local! {
    static HANDLE: (Handle, Option<Guard>) = (COLLECTOR.register(), None);
}

pub fn epoch::pin_with<F: FnOnce(&Guard)>(f: F);

This will return a reference to the currently active thread-local Guard if there is one, or pin the epoch if there is no currently active guard.

ghost commented 6 years ago

@Amanieu That seems a bit too restrictive - in particular, it would make implementing iterators that own guards impossible, wouldn't it?

However, going the other way around and implementing the one-guard-per-handle interface by wrapping the current one would be straightforward (except for implementing MyGuard: Send):

struct MyHandle {
    handle: Handle,
    pinned: Cell<bool>,
}

impl MyHandle {
    fn pin(&self) -> Option<MyGuard> {
        if self.pinned.get() {
            None
        } else {
            Some(self.handle.pin())
        }
    }
}

struct MyGuard {
    guard: Guard,
}

unsafe impl Send for MyHandle {}

Getting MyGuard: Send right is a bit tricky, though...

Maybe we should expose a few low-level methods for pinning without creating guards (like raw_lock and raw_unlock):

impl Handle {
    unsafe fn pin_unchecked();
    unsafe fn unpin_unchecked();
    unsafe fn defer_unchecked<F: FnOnce()>(f: F);
}

Then we can implement your interface like this:

struct MyHandle {
    handle: Handle,
    pinned: AtomicBool,
}

impl MyHandle {
    fn pin(&self) -> Option<MyGuard> {
        if self.pinned.swap(true, Relaxed) {
            None
        } else {
            unsafe { self.pin_unchecked(); }
            Some(MyGuard { handle: self.clone() }) // atomically increments the refcount
        }
    }
}

struct MyGuard {
    handle: Handle,
}

impl Drop for MyGuard {
    fn drop(&mut self) {
        unsafe { self.handle.pin_unchecked(); }
    }
}

unsafe impl Send for MyHandle {}
unsafe impl Send for MyGuard {}

Amanieu commented 6 years ago

The problem with MyHandle is that it needs to give a &Guard because that is what SkipList expects. However this is unsound because the Guard can be cloned. This breaks the invariant of only having one guard per handle, which MyHandle requires to be able to be safely sent to another thread (the Guard must always be sent along with the Handle).

After consideration, I am happy with your proposed API except that I would like Guard::clone to be removed.

Amanieu commented 6 years ago

For reference, here's my implementation of SendHandle:

struct SendHandle {
    handle: Handle,
    guard: Option<Guard>,
}
unsafe impl Send for SendHandle {}
impl SendHandle {
    pub fn new(collector: &Collector) -> SendHandle {
        SendHandle {
            handle: collector.register(),
            guard: None,
        }
    }
    pub fn pin(&mut self) -> GuardRef {
        let handle = &mut self.handle;
        self.guard.get_or_insert_with(|| handle.pin());
        GuardRef(&mut self.guard)
    }
    pub fn get(&mut self) -> Option<&Guard> {
        self.guard.as_ref()
    }
    pub fn unpin(&mut self) {
        self.guard = None;
    }
}
struct GuardRef<'a>(&'a mut Option<Guard>);
impl<'a> GuardRef<'a> {
    pub fn forget(self) {
        mem::forget(self);
    }
}
impl<'a> core::ops::Deref for GuardRef<'a> {
    type Target = Guard;
    fn deref(&self) -> &Guard {
        self.0.as_ref().unwrap()
    }
}
impl<'a> Drop for GuardRef<'a> {
    fn drop(&mut self) {
        *self.0 = None;
    }
}

Two points of note:

It only exposes a &Guard, not a &mut Guard. The latter is unsound because you could "steal" the guard with mem::replace.
This implementation is unsound if Guard is clonable because you could send the SendHandle to thread B while still having a guard in thread A, which can lead to data races.

ghost commented 6 years ago

@Amanieu

After consideration, I am happy with your proposed API except that I would like Guard::clone to be removed.

That seems reasonable. One thing to keep in mind is that this precludes us from ever adding method Guard::handle() returning an Option<&Handle> (because one could just do guard.handle().pin() to clone it). But I suppose it's worth making that tradeoff.

@Vtec234

Please forgive my ignorance, but as I understand, the benefit of using parity over a single counter is that with parity the epoch can still be advanced by 1 when there is a global pin present, right?

Yes, exactly.

If a larger counter is used, say mod 4, and then the epoch age required for destruction extended to 5, can the epoch advance by up to 3 with a global pin present?

Actually, it doesn't matter whether we have 2, or 4, or even 100 counters. Here goes a lengthy explanation... Sorry in advance - it'll require some mental gymnastics. :)

Consider what happens in our current epoch tracking mechanism. In order to pin the current thread, we (1) load the global epoch, (2) store it into our thread-local storage, and (3) execute a fence. Since 1 << 63 is so large, we can pretty safely assume that that the global epoch won't overflow between steps (2) and (3). That is just extremely unlikely to happen. :) One more thing: recall that our basic invariant is that the global epoch can advance at most once while the thread is pinned (i.e. between step (3) and the final call to unpin).

But what if it was likely for the global epoch to advance by 1 << 63 (on 64-bit platforms) between steps (2) and (3)? Let's say that in step (1) we load global_epoch=7. Now assume that after we store it into thread-local storage in step (2) and execute the fence in step (3), the global epoch almost wraps around and becomes global_epoch=6. Concurrently, another thread is executing function try_advance - it managed to iterate the whole list between our steps (2) and (3), and is now almost finished - the only thing left to do is to increment the global epoch. After our step (3) has finished, that thread sets global_epoch=8. This means that the global epoch can in fact advance twice while we're pinned (first from 6 to 7, and then from 7 to 8).

If we assume it's possible for the global epoch to advance by 1 << 63 between steps (2) and (3), then we have to change the expiration threshold from 2 to 3. The reason why we need to do that if we introduce global pinning with two counters is simply because it's very possible for global epoch to wrap around modulo 2 (in contrast to wrapping modulo 1 << 63).

crossbeam-rs / rfcs

Global and local pinning #31