crossbeam-rs / rfcs

RFCs for changes to Crossbeam
Apache License 2.0
146 stars 13 forks source link

Atomic API #2

Closed ghost closed 7 years ago

ghost commented 7 years ago

This is a proposal for a better Atomic API.

Rendered

For reference, here are other similar APIs:

  1. Current API in Crossbeam
  2. Current API in coco
arthurprs commented 7 years ago

I'm still thinking about all this, but:

2.1 - Provide an unsafe variant that takes memory ordering (cas_weak, ...) and a safe one that's always seqcst (cas_weak_seq_cst, ...).

arthurprs commented 7 years ago

I'm a bit hesitant on this but I thought I'd share anyway. We could use the type system to provide type-safe nullability and taggabiliy for zero-cost safety, something along the lines of:

trait Nullability; trait Taggabiliy;

these traits will provide some methods that will be implemented by

struct Nullable; struct NonNullable; struct Taggeable; struct NonTaggeable;

And ultimately be used by the structs Ref, Atomic and their methods.

struct Ref<'a, T, X: Nullability, Z: Taggabiliy = NonTaggeable> {.. } struct Atomic<T, Z: Taggabiliy = NonTaggeable> { .. }

This is not a full description but I'm hopping to start a conversation at least.

I've seem the type system being used for similar things with good success, like:

jeehoonkang commented 7 years ago

As discussed in https://github.com/stjepang/coco/pull/2, I think:

nikomatsakis commented 7 years ago

I've not had time to digest these APIs in detail, however I did want to get some clarification about one assertion made in the RFC (I'm not that familiar with crossbeam's API, so I'm looking into it as I go):

By using relaxed orderings we can create a data race within safe code. This is unsound, as Rust must not allow data races in safe code.

The data-race here is specifically because the load API offers an Option<Shared<T>>, right? That is, because Shared<T> implements Deref, we are basically asserting that if you get your hands on one, it is safe to deref -- but with the Relaxed store, there is no reason for that to be true. (In contrast, the AtomicPtr API in the standard library gives yo back a *mut T, meaning you have to decide for yourself if it is safe to deref.)

ghost commented 7 years ago

@nikomatsakis Exactly. Dereferencing is the crucial operation here.

We can say that dereferencing is simply unsafe and call it a day. Or, as @jeehoonkang suggests, we can simply mark all load/store/swap/cas/cas_weak methods as unsafe. But that sounds like capitulation to me. :)

I believe we can design an API that is both safe and performant for 95% use cases, and would like to push for that. This is Rust's overall philosophy anyway: let's make most of the code easy and safe, and apply unsafe blocks in the few special cases.

ghost commented 7 years ago

2.1 - Provide an unsafe variant that takes memory ordering (cas_weak, ...) and a safe one that's always seqcst (cas_weak_seq_cst, ...).

But Acquire/Release/AcqRel are safe as well. Generally speaking, I think memory orderings for all atomic operations fall into three categories:

  1. Relaxed ordering. This is unsafe.
  2. Acquire (for load), Release (for store), AcqRel (for swap, cas, and cas_weak). This is safe.
  3. SeqCst ordering (just like 2nd category, but with guaranteed global total order). This is also safe.

When writing concurrent data structures, I believe that the 2nd and 3rd are equally commonly needed, while the 1st is pretty rare.

For example, to allow full flexibility for store operation, we could have the following methods:

Unsafe methods store_relaxed and store_box_relaxed are not very compelling. Since they're already unsafe (and I believe rare), you can just as well use store_raw.

We could use the type system to provide type-safe nullability and taggabiliy for zero-cost safety, something along the lines of:

This is certainly an interesting approach, although I haven't come up with a satisfying solution in this direction yet.

arthurprs commented 7 years ago

But Acquire/Release/AcqRel are safe as well.

Sure, but the idea is to limit the number of methods.

I'm not confident it's possible o provide an actual safe interface. It's gonna be really tricky to write a concurrent data structure without unsafe somewhere anyway.

ghost commented 7 years ago

A few more ideas:

Another possibility is to let methods accept an argument that tells whether they are sequentially consistent or not. For example:

// AcqRel when false, SeqCst when true
fn swap<'a>(&self, new: Ptr<'a, T>, seq_cst: bool) -> Ptr<'a, T>;

// Just to avoid confusion: `Ptr` holds a (possibly null) pointer and a tag encoded within its unused bits.

Then, with swap_raw you have full control over precise ordering when you need it:

fn swap_raw<'a>(&self, new: *mut T, tag: usize, order: Ordering) -> Ptr<'a, T>;

Perhaps even macros could help to allow syntax like:

cas!(head, old, new); // SeqCst by default
cas!(head, old, new, AcqRel);
cas!(head, old, Box::new(Node::new()), SeqCst);
jeehoonkang commented 7 years ago

@stjepang oops! I made a mistake in writing the previous comment. I meant to mark dereferences as unsafe. Sorry for confusion.

jeehoonkang commented 7 years ago

I believe we can design an API that is both safe and performant for 95% use cases, and would like to push for that. This is Rust's overall philosophy anyway: let's make most of the code easy and safe, and apply unsafe blocks in the few special cases.

I don't agree with this characterization. In Crossbeam master, Chase-Lev dequeue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/chase_lev.rs#L195), MS queue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/ms_queue.rs#L89), segmented queue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/seg_queue.rs#L63), and Treiber stack (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/treiber_stack.rs#L35) all have at least one relaxed access. In general, almost every data structure implemented in C/C++11 memory model I know of is using relaxed accesses.

Also, I think crossbeam-epoch is essentially unsafe: it is well-known that non-blocking data structures are really hard to implement, and the difficulty is beyond what Rust's type system can give to us. Yes, I am proposing to "give up", but I think the burden of ensuring race-freedom (except for the races with deallocation) in the data structures is at the user's hand, not ours.

arthurprs commented 7 years ago

I spend some time with the type system to check if (https://github.com/crossbeam-rs/rfcs/pull/2#issuecomment-298902593) could work and it appears so https://gist.github.com/arthurprs/3da8f45f99c6acb663cc09c49888147d

I implemented only a bit of the coco interface and it needs some typedefs to make it ergonomic.

nikomatsakis commented 7 years ago

@stjepang what about a builder-like pattern. Give acquire-release by default, but allow opt-in to other orderings:

ptr.seqcst().swap(...);
unsafe { ptr.relaxed().swap(...); }
jeehoonkang commented 7 years ago

I have a comment on the tagged pointers. This may be relevant in a far future, but...

Suppose the consume ordering will be introduced in Rust in the future (possibly very far from now). Then the tagged pointers cannot be used for the consume ordering, since they are implemented with AtomicUSize. Note that the dependency for the consume ordering is carried only to the pointer types, but not integer types such as AtomicUSize.

This may not be an issue for Rust or Crossbeam, since consume is very premature and the conflict between consume and pointer tagging is recognized by the C/C++ standard committee and the people are discussing how solve it. I am just writing down this comment for noting that consume may be relevant to the design choices we are making here..

schets commented 7 years ago

@jeehoonkang Are you sure you aren't mixed up on the problems with consume ordering? Consume ordering can depend on non-pointer loads, take this code:

int mt_values[];
std::atomic<int> mt_index;
{
      int which_index = mt_index.load(std::memory_order_consume);
      int value = mt_values[mt_index]; // address dependency on which_index
}

There are issues like this where the compiler optimizes the dependency away, but I can't find any issues about tagged pointers, working only with integers, or anything in the standard which indicates consume only applies to pointer types. The standard is worded in such a way to imply that any value type loaded with consume can order loads in a dependency chain:

12: An evaluation A is dependency-ordered before an evaluation B if (12.1) — A performs a release operation on an atomic object M, and, in another thread, B performs a consume operation on M and reads a value written by any side effect in the release sequence headed by A, or (12.2) — for some evaluation X, A is dependency-ordered before X and X carries a dependency to B.

jeehoonkang commented 7 years ago

@schets Sorry, I should have cited the document I just implied: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r3.pdf http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0462r1.pdf The standard committee agrees that the current definition of consume is unusable, and is preparing a fix to that. The document says:

"Another conspicuous change is that this proposal does not require dependencies to be carried by integers, but instead only by pointers. Note that versions v4.2 and later of the Linux kernel avoid carrying dependencies through integer variables [15]. At first glance, it might seem that restricted carrying of dependencies through intptr_t and uintptr_t is needed to tag pointers, but such tagging is not supported by the current C++ standard. This document therefore completely prohibits carrying dependencies through integers."

schets commented 7 years ago

That isn't part of the standard nor is it going to be in c++17, so I don't think we should worry about it. I find it absurd that they would prevent bit manipulation of pointers from being carried as a dependency, and expect the proposal will either allow for such behavior or a standardization proposal for that will be submitted later.

It might be worth thinking about if the pointer tagging scheme would cause the compiler to do anything funky, although LLVM and any other respectable c++ compiler could never do that or else they would break every single program out there. If this is ever expected to support architectures that don't use standard pointers (near/far scheme, 32 bit chars, something exotic?) we could think about that as well.

jeehoonkang commented 7 years ago

@schets yeah, the standard committee recognizes pointer tagging is essential, and surely they will submit another memorandum to deal with it. I am leaving a comment just to note that there is a discussion on it. I agree with you that the standard thing is not an important issue for now, and it is more urgent to see if the existing compilers will work fine.

arthurprs commented 7 years ago

Getting rid of the tag argument in functions is a great change, I can get behind this interface.

It seems to me that optional pointer tagging is a good use case for a type parameter like I showed up-thread. Although in that example I included nulling as well, which I think may not carry it's weight.

ghost commented 7 years ago

I've updated the RFC. This is now a real proposal and not a draft anymore. :)

@arthurprs Your idea of providing optional nullability and taggability is interesting. However, I think the right choice here is: Always support nullability and taggability. Having other combinations available is just not worth the hassle, IMO.

@nikomatsakis Builder-like pattern at first looks like a potential solution, but feels a tad too odd, since it's inconsistent with existing conventions (passing ordering as an argument). But more importantly...

@jeehoonkang I think it's best to do what you proposed long time ago: just mark deferencing as unsafe and call it a day. An API that allows safe dereferencing is too difficult to design and probably still too impractical in the end. So I gave up trying to pursue complete safety. :)

arthurprs commented 7 years ago

Having other combinations available is just not worth the hassle, IMO.

It don't need to be a hassle. From what I learned in that example, tagging as a type parameter in the api you proposed now is really simple and will make it zero cost regarding tags.

I'll write a sample how it could look like.

Vtec234 commented 7 years ago

Hey, thanks for the RFC! It's great to see crossbeam moving forward to better foundations!

Consider a node in a skiplist. It consists of: key, value, tower. The tower is an array of atomic pointers. To save a level of indirection, it is wise to lay out the entire tower inside the node. This means that nodes are dynamically sized.

A somewhat space-wasting solution is to have a tower array of constant size MAX_LEVELS with some levels empty, or None. It's not too bad, since afaik you can prove that at most (lg n) levels are necessary with high probability.

Yes, I am proposing to "give up", but I think the burden of ensuring race-freedom (except for the races with deallocation) in the data structures is at the user's hand, not ours.

Why this defeatism? :D It generally happens in Rust that internals of high-performance libraries would be unsafe, and so is the case for crossbeam-epoch, which isn't really meant to be widely used anyway. But this does not prevent us from providing safe and easy-to-use structures built on top of epoch, which is really the end goal.

There's a bit of an issue with tagged pointers. To be taggable, a pointer has to have alignment of at least 2. How would the API behave when a pointer only has alignment 1? Specifically, what does tag(&self) return in such a case? I see two options. One is to make it return Option<usize>, which would be slightly burdensome. The other is to just agree that an untaggable pointer always has tag 0 and mention that in the documentation. ~The same applies to with_tag(self, tag: usize). It could simply ignore the tag for untaggable pointers, but again, we have to document that, or return Option<Self>.~ Coco just panics with tag too large and that's fine, but it does accept tag 0. Might be worth documenting.

Vtec234 commented 7 years ago

Additionally, I believe it would be worth it to provide both compare-and-set and compare-and-swap, changing the naming of the appropriate methods in accordance with crossbeam-rs/crossbeam#116 to eliminate confusion about what cas stands for.

Another related issue is crossbeam-rs/crossbeam#113 and the proposal seems to solve it :)

I noticed something more worrying. Since there is no store_owned, storing a new value that we just constructed from e.g. a Box requires using Owned::into_shared, and that requires a pin, even though it's unnecessary, since we're not getting a Shared back! This incurs non-negligible overhead, so I propose adding store_owned back.

jeehoonkang commented 7 years ago

Thank you for finishing up the RFC. IMO this is a huge achievement! I just have two comments.

arthurprs commented 7 years ago

@stjepang This is close to what I had in mind, https://gist.github.com/arthurprs/3da8f45f99c6acb663cc09c49888147d

The important bit is

pub struct Atomic<T, Z: Taggabiliy=NonTaggeable> {
}

pub struct Shared<'p, T: 'p, Z: Taggabiliy=NonTaggeable> {
}

pub mod tagged {
    pub type Atomic<T> = super::Atomic<T, super::Taggeable>;
    pub type Shared<'a, T> =  super::Shared<'a, T,  super::Taggeable>;
}

Which makes tagging free for modules that don't need them use crossbeam::atomic::{Atomic, ..} and users opt-in by importing stuff from tagged submodule use crossbeam::atomic::tagged::{Atomic, ...}

So no hassle, no duplicated code and don't pay for what you don't use.

ghost commented 7 years ago

I'll write a sample how it could look like.

That'd be helpful! So, the question is "is it worth splitting taggable and non-taggable atomics at all?" I think not, since the cost of ptr & MASK is really negligible in both code size and runtime cost. :) The other issue is that tagged atomics force proper alignment, but is that a real concern?

By the way, after moving MsQueue and TreiberStack to taggable atomics I didn't notice any difference in performance.

A somewhat space-wasting solution is to have a tower array of constant size MAX_LEVELS with some levels empty, or None. It's not too bad, since afaik you can prove that at most (lg n) levels are necessary with high probability.

The average tower height is 1 + 1/2 + 1/4 + 1/8 + ... = 2. So, if all nodes have a tower of height 32, while the average is 2, that's a lot of wasted space. :) That also contributes to worse cache locality, therefore making the perfomance substantially worse (I did the benchmarking).

There's a bit of an issue with tagged pointers. To be taggable, a pointer has to have alignment of at least 2. How would the API behave when a pointer only has alignment 1? Specifically, what does tag(&self) return in such a case?

Well, if a pointer has alignment of 1, the only value a tag can have is zero. The maximum value a tag can have is alignment - 1. Method with_tag should panic if the tag is too large. We could return an optional value to indicate whether tagging succeeded, but would anyone ever use that? Everyone would just call .unwrap() right after tagging. :)

Additionally, I believe it would be worth it to provide both compare-and-set and compare-and-swap, changing the naming of the appropriate methods in accordance with crossbeam-rs/crossbeam#116 to eliminate confusion about what cas stands for.

The difference between those two is that compare-and-swap does more (it performs a load on failure) and requires a pin/guard. Is there a real scenario where compare-and-set is more desirable? Yes, compare-and-set is more desirable when the thread is not pinned, but why would someone actually want to do CAS on an Atomic when it isn't pinned? A concrete example here would be illuminating.

Compare-and-swap is a very common operation, so I'd strongly prefer having the name short. Especially because it takes four arguments. Arguably, the weak variant is more commonly used than the strong variant, so if we go with longer names, this might be the most common use of CAS in practice:

match self.head.compare_and_swap_weak_owned(head, node, AcqRel, pin) {
    Ok(_) => break,
    Err((h, n)) => {
        head = h;
        node = n;
    }
}

Alternatives to compare_and_set and compare_and_swap might be cmp_set and cmp_swap, or maybe cas and cmpxchg. The proposed cas is more similar to compare_exchange rather than compare_and_swap in libstd anyway. BTW, I find it silly that compare-and-set is not named compare-and-store instead. :)

Also, the current proposed cas is not really compare-and-swap nor compare-and-set. It swaps on failure, but doesn't swap on success. It's also not compare-exchange because it returns the new value on success. Naming is hard.

To summarize, I personally don't think we need a split between compare-and-set and compare-and-swap. But if you do, I'd like to ask for the following:

  1. A not too verbose naming scheme.
  2. A convincing case demonstrating the need for compare-and-set.

I noticed something more worrying. Since there is no store_owned, storing a new value that we just constructed from e.g. a Box requires using Owned::into_shared, and that requires a pin, even though it's unnecessary, since we're not getting a Shared back! This incurs non-negligible overhead, so I propose adding store_owned back.

This is another method for which I haven't yet found a compelling use case. If you do have any, let me know. :) But I'm not as opposed to this idea - adding store_owned back sounds reasonable to me.

In the world of C/C++, data race on relaxed accesses is safe. Data race on nonatomic accesses is unsafe.

Oh, I guess my statement on data races was worded poorly. Should've been more clear on that accessing unsynchronized non-atomic data is the real problem. I'll try fixing it.

jeehoonkang commented 7 years ago

I personally prefer the longer names and compare-and-set, solely because it follows the naming scheme of libstd. What do you think of this aspect?

ghost commented 7 years ago

@arthurprs Thanks for writing this!

It's a really neat trick. I just tried running cargo doc --open on that piece of code, but the added complexity stands out a bit. We can fix some rough edges with type aliases, but some will remain, e.g. signatures like:

fn load<'p>(&self, ord: Ordering, _: &'p Pin) -> Shared<'p, T, Z>;
fn store<'p>(&self, ord: Ordering, new: Shared<'p, T, Z>);

This is not too complex nor too much of a hassle, but still something that adds complexity. Let me try to summarize my stance. It's all a matter of tradeoffs.

Advantages:

  1. Users don't need to pay the price (one bitwise AND with a constant) of masking the pointer.
  2. Users can have pointers to misalinged data.

Disadvantages:

  1. Added layer of complexity. The docs don't look as clean.

Perhaps you'll disagree, but for me the disadvantage is greater than the sum of advantages. :) It'd be good to hear what others think as well. @jeehoonkang, @Vtec234 ?

@jeehoonkang

I personally prefer the longer names and compare-and-set, solely because it follows the naming scheme of libstd. What do you think of this aspect?

Well, cas and cas_owned do resemble compare_and_set and compare_exchange, but the differences are subtle:

// Returns current on success, or loads on failure.
fn compare_and_swap(
    &self, 
    current: *mut T, 
    new: *mut T, 
    order: Ordering
) -> *mut T;

// Returns current on success, or loads on failure.
fn compare_exchange(
    &self, 
    current: *mut T, 
    new: *mut T, 
    success: Ordering, 
    failure: Ordering
) -> Result<*mut T, *mut T>;

// Returns () on success, or loads on failure.
fn cas<'p>(
    &self,
    current: Shared<T>,
    new: Shared<T>,
    ord: Ordering,
    _: &'p Pin,
) -> Result<(), Shared<'p, T>>;

// Returns new on success, or loads on failure.
fn cas_owned<'p>(
    &self,
    current: Shared<T>,
    new: Owned<T>,
    ord: Ordering,
    _: &'p Pin,
) -> Result<Shared<'p, T>, (Shared<'p, T>, Owned<T>)>;

I'd generally lean towards established conventions, but in this case we have small differences so conflating methods in Crossbeam with those in libstd under the same names is not as appealing. Also, data structures in Crossbeam will heavily use CAS operations, and things like self.head.compare_and_swap_weak_owned(head, node, AcqRel, pin) will be common and look terrible. :)

As a side note, the silent voice in my head doesn't say "compare-and-swap this pointer with that one" but rather "install this pointer if the old is that one". Or just "cas this pointer". It's a common method so I like it terse.

Another issue is that I believe we don't need both compare-and-swap and compare-and-set (explanation at the end of this comment), so being explicit about what particular flavor of CAS is in question is not as important.

@Vtec234 Just to elaborate a bit on your comment regarding store_owned...

I feel okay about the current API, but not great. Some odd aspects stand out, e.g. you can store a value without a pin, but you can't CAS without a pin. Also, load_raw feels out of place.

On a more higher level, I think Atomic should:

  1. Support all commonly used atomic operations.
  2. Aim to not go overboard with too many methods.
  3. Aid in correct use of epochs by leveraging lifetimes and pins/guards.
  4. Allow operations on raw pointers and/or operations that don't require a pin/guard.

The current API succeeds on all points except the last one. Regarding that one, I'd like to avoid having methods like load_raw, store_raw, swap_raw, cas_raw, cas_weak_raw. That adds a lot of new methods. And what about cases where you want to do swap with a raw pointer, but get back a Shared? Or CAS where the previous value is a raw pointer, but the new is Shared?

Instead of creating special methods aimed at raw pointers, I think converting raw pointers to Shared is the way to go, especially in presence of pointer tagging. For example:

let a: *mut T = _;
let b: Shared<T> = _;

// Bad:
let c: *mut T = atomic.cas_raw((a, 1), (b.as_raw(), b.tag()), SeqCst).0;

// Better:
let a = Shared::from_raw(a);
let c: *mut T = atomic.cas(a.with_tag(1), b, SeqCst, pin).as_raw();

The only *_raw method in the current proposal is load_raw. And it is odd. So why does it exist? It exists because loading atomics without pinning in destructors (Drop impls) is very useful. But this begs the question: do destructors need raw pointers or do they need to avoid pinning? Of course, the latter is really true.

All concurrent data structures go through three phases:

  1. Construction (e.g. new and from_iter).
  2. Sharing among multiple threads.
  3. Destruction (drop).

During construction and destruction we have a &mut to the whole data structure and would like to avoid pinning the current thread. At the moment, Crossbeam pins the current thread during construction of MsQueue and during destruction of Deque.

Perhaps it would make sense to create a "fake" pin/guard during construction and destruction in order to pass it to methods on Atomic that require it. Perhaps we shouldn't think of pin/guard as "proof that the current thread is pinned", but rather "proof that nothing will be destroyed in this scope".

When constructing/destructing we know that nothing will be concurrently destroyed under our feet. Perhaps it'd make sense to instead provide an unsafe function that gives a pin/guard, but rather than pinning the current thread just asserts we have ownership of the whole data structure? Then the destructor for Stack might look like this:

impl<T> Drop for Stack<T> {
    fn drop(&mut self) {
        let pin = unsafe { &epoch::assert_ownership() };

        let mut curr = self.head.load(Relaxed, pin);
        while !curr.is_null() {
            unsafe {
                let next = curr.deref().next.load(Relaxed, pin);
                drop(Box::from_raw(curr.as_raw()));
                curr = next;
            }
        }
    }
}

If we introduce that function, there'll be no need for store_owned, load_raw, and compare-and-set. It'll also allow us to freely use Shared<T> instead of (*mut T, usize). Thoughts?

Vtec234 commented 7 years ago

The average tower height is 1 + 1/2 + 1/4 + 1/8 + ... = 2. So, if all nodes have a tower of height 32, while the average is 2, that's a lot of wasted space.

~Not that it matters at all for this discussion, but just for factual correctness, I'm pretty sure that this is incorrect. That infinite sum is the probability of the height falling in the range [0, +infty]. If you ignore height 0 (impossible), it's correctly probability 1. The actual average height is (lg n), so for 65536 elements it would be height 16, which is sometimes used as the constant limit. With that limit the wasted space would be n(16-lg(n)), which.. I agree is still pretty bad.~ This is wrong, please ignore.

fn load<'p>(&self, ord: Ordering, _: &'p Pin) -> Shared<'p, T, Z>;
fn store<'p>(&self, ord: Ordering, new: Shared<'p, T, Z>);

It's an interesting idea. On the other hand, the cost of untagging is negligible compared to all the sync operations that structures do. I was originally opposed to merging Atomic and TaggedAtomic since the resulting API would be really clunky with tags everywhere, but the new design separates them out neatly. Adding trait hackery to separate types would only make it more confusing.

Users can have pointers to misalinged data.

This is one concern I'm not sure the importance of. I've never seen data being purposefully misaligned for any reason, but if there are exotic cases where it matters, that might be worth considering. If there aren't, I'd agree with @stjepang that the current form, i.e. having just one form of pointer is preferable.

impl<'p, T> Shared<'p, T> {
    pub unsafe fn deref(&self) -> &'p T;
    pub unsafe fn as_ref(&self) -> Option<&'p T>;
}

What does deref do? If it's just as_ref().unwrap(), then it's really unnecessary to have it.

I split the more important point into a post below. ~EDIT: Needs more thought, will post later.~

ghost commented 7 years ago

Not that it matters at all for this discussion, but just for factual correctness, I'm pretty sure that this is incorrect.

Oh, I think we're just talking about different things. I'm saying: sum(tower_heights) / len(tower_heights) = 2. You're saying: max(tower_heights) = log(n). Anyways, total wasted (unused) memory is (MAX_LEVELS - 2) * len(tower_heights) * size_of::<usize>().

This is one concern I'm not sure the importance of. I've never seen data being purposefully misaligned for any reason, but if there are exotic cases where it matters, that might be worth considering.

I'm not aware of any either. If there are legitimate situations for misaligned data, it sounds like one is doing something so low-level that they'd probably want to avoid Atomics entirely anyway.

What does deref do? If it's just as_ref().unwrap(), then it's really unnecessary to have it.

It is similar to as_ref().unwrap(), but it is also zero-cost: it doesn't panic when the pointer is null.

Vtec234 commented 7 years ago

After digesting this a little, I've come to the conclusion that it would be best to make Owned nullable and remove Shared::null, reinstating the status of Shared as proving the existence of a pinned epoch.

Here is more or less why:

Since Shared is now nullable, it can be created without a pin by calling Shared::null(). In other words, the existence of a Shared used to be proof that the current thread is pinned. But now it isn't anymore, so methods like cas_and_ref must take a pin/guard.

Perhaps it would make sense to create a "fake" pin/guard during construction and destruction in order to pass it to methods on Atomic that require it. Perhaps we shouldn't think of pin/guard as "proof that the current thread is pinned", but rather "proof that nothing will be destroyed in this scope".

This is an interesting idea, but I find it really confusing. I believe that we should stick to the concepts begin epoch-based reclamation and let our API reflect them as best as possible. A pin asserts a pinned epoch, and a shared pointer is contingent on that pin. The usage of lifetimes in this case really nicely enforces this idea and we should not diverge from it. Allowing a non-pinned Shared essentialy invalidates the whole purpose of the type. The "fake" pin would also require an internal flag in the Pin to notify it not to unpin on destruction, but that's probably simple enough to do, so it's not really an argument against it.

Support all commonly used atomic operations. Aim to not go overboard with too many methods. Aid in correct use of epochs by leveraging lifetimes and pins/guards. Allow operations on raw pointers and/or operations that don't require a pin/guard.

I agree wholeheartedly with these goals and it seems to me that so does the above reasoning.

The changes to the API would be:

impl<T> Atomic<T> {
+  pub fn store_owned(&self, new: Owned<T>, ord: Ordering);

-   pub fn swap<'p>(&self, new: Shared<T>, ord: Ordering, _: &'p Pin) -> Shared<'p, T>;
+   pub fn swap<'p>(&self, new: Shared<'p, T>, ord: Ordering) -> Shared<'p, T>;

-  pub fn cas<'p>(
-        &self,
-        current: Shared<T>,
-        new: Shared<T>,
-        ord: Ordering,
-        _: &'p Pin,
-    ) -> Result<(), Shared<'p, T>>;

+  pub fn cas<'p>(
+        &self,
+        current: Shared<T>,
+        new: Shared<'p, T>,
+        ord: Ordering,
+    ) -> Result<(), Shared<'p, T>>;

... same for other CASs
}

// having to get rid of these is slightly annoying
// an alternative would be to have Option<Owned> arguments, but as I understand
// we do not want Options everywhere
- impl<T> Deref for Owned<T> { ... }
- impl<T> DerefMut for Owned<T> { ... }

impl<T> Owned<T> {
+  pub fn null() -> Self;
+  pub fn is_null(&self) -> bool;
+  pub fn as_ref(&self) -> Option<&T>;
}

What do you think?

jeehoonkang commented 7 years ago
ghost commented 7 years ago

Okay, this time I'd like to take a step back and perhaps try with new terminology. Perhaps a fresh story might sound more coherent.

Forget everything you know about Crossbeam...

So, epoch-based GC has this Atomic<T> type. It is an atomic pointer to a heap-allocated T. Loading an Atomic<T> returns a pointer Ptr<'scope, T> that can be dereferenced (if it isn't null). Example:

let a = Atomic::new(7); // Atomic pointer to a heap-allocated number 7.
let p = a.load(SeqCst);

But this doesn't work just yet! Epoch-based GC is all about scopes. When working with heap-allocated objects, we don't want other threads to concurrently destroy them under our feet. Let's say we'd like to print the loaded number:

let p = a.load(SeqCst);
println!("{}", *p.as_ref().unwrap());

Between the load and println we really don't want some other thread to free the allocated number 7. There's a way to say to the GC: "please don't destroy any garbage created during the life of this scope before it ends":

epoch::pin(|scope| {
    let p = a.load(Seqcst, scope);
    println!("{}", p.as_ref().unwrap());
})

Now, if another thread decides to free the heap-allocated number at a time between the load and println, freeing will be simply deferred. The garbage collector keeps track of such scopes and will destroy it later at a safe time.

All Ptrs have a lifetime in their type: Ptr<'scope, T>, which indicates the scope in which they're usable.

We can summarize the lives of Atomics and Ptrs like this:

The sample code still doesn't compile because as_ref() is unsafe. Dereferencing is safe only if you use memory orderings to properly synchronize writes/reads to/from heap-allocted objects. This will compile:

epoch::pin(|scope| {
    let p = a.load(Seqcst, scope);
    println!("{}", unsafe { p.as_ref() }.unwrap());
});

If you're wondering why we're calling unwrap() here, that is because the pointer might be null. Conversion to a reference succeeds only if it is not null.

How do we deallocate heap-allocated objects? This is how:

epoch::pin(|scope| {
    let p = a.load(SeqCst, scope);
    if !p.is_null() {
        // Try "stealing" the object from the atomic pointer.
        if a.compare_and_swap(p, Ptr::null(), SeqCst, scope).is_ok() {
            // Success! Add the object to the garbage collector.
            unsafe { epoch::defer_free(p, scope) };
        }
    }
})

Function defer_free is unsafe because we must guarantee that the object is now unreachable from atomic pointers. Since we "stole" the object by compare-and-swapping the pointer to something else (null), we know that other threads can't obtain it again.

Some threads might still be holding a Ptr to it in their scopes, but as soon as their scopes end, no threads will be able to have a Ptr to it again.

Scopes created by epoch::pin must be short-lived, otherwise the GC might get stalled for a long time and keep growing too much garbage.

Sometimes, however, we'd like to have longer-lived scopes in which we know our thread is the only one accessing atomics. This is true e.g. when destructing a big data structure, or when constructing it from a long iterator. In such cases we don't need to be overprotective because there is no fear of other threads concurrently destroying objects. Then we can do:

unsafe {
    epoch::unprotected(|scope| {
        // Destruct all nodes in a Treiber stack.
        let mut curr = head.load(Relaxed, scope);
        while !curr.is_null() {
            let next = curr.deref().next.load(Relaxed, scope);
            drop(Box::from_raw(curr));
            curr = next;
        }
    })
}

Function unprotected is unsafe because we must promise that no other thread is accessing our Atomics and objects.

Just like with the safe pin function, unprotected use of atomics is enclosed within a scope so that pointers created by it don't leak out or get mixed with pointers from other scopes.

The only remaining question is: how do we store new objects into Atomics? For that, there is type Owned<T>, which is almost identical to Box<T>. In fact, Owned<T> can even be constructed directly from Box<T>.

We can convert an Owned<T> into Ptr<'scope, T>:

let mut owned = Owned::new(Node {
    value: value,
    next: Atomic::null(),
});

epoch::pin(|scope| {
    let node = owned.into_ptr(scope);
    // ...
})

Or, we can use compare-and-swap to install new objects into atomic pointers. Here's how one might push a new owned node onto a stack:

epoch::pin(|scope| {
    let mut head = self.head.load(Acquire, scope);
    loop {
        owned.next.store(head, Relaxed);
        match self.head.compare_and_swap_weak_owned(head, owned, AcqRel, scope) {
            Ok(_) => break,
            Err((h, o)) => {
                head = h;
                owned = o;
            }
        }
    }
})

Complete API

That's all! How does it sound this time?

EDIT:

@Vtec234 suggested renaming epoch::unprotected to epoch::exclusive. I like that.

arthurprs commented 7 years ago

I think this is solid, don't mind my crazy type system suggestions.

:+1: for pin::exclusive and stdlib like (long) names.

ghost commented 7 years ago

All right, I've updated the RFC with the recent changes. Thanks everyone for the feedback and your inputs! We're getting very close to finalization, but a lot has changed recently so I'd like to let everything sink in a bit, at least for a few days.

It seems that at this point only bikeshedding remains: Should epoch::unprotected be renamed to epoch::exclusive? Or perhaps something else?

I'll try to summarize what everyone thinks on that:

Unless a better suggestion comes along or someone argues against it, it'll be renamed to epoch::exclusive.

@aturon You don't have to read the lengthy discussion, but if you could skim through the RFC, that'd be super appreciated! After all, you designed the original interface and have lots of experience with API design in Rust, so your opinion would be very valuable here. :)

arthurprs commented 7 years ago

One could suggest epoch::local() as well, but I'm ok with epoch::exclusive()

ghost commented 7 years ago

Here's an argument against naming it pin::exclusive...

Yes, the function is useful in situations where we know that our thread is the only one accessing atomics (exclusively). But, it is also useful in situations where we simply know that no thread (including our own) is modifying atomics.

An example can be found in the implementation of deque in Coco. In method push we need to load the pointer to the buffer. Pinning is unnecessary because we know that our thread is the only one that is able to modify the pointer, and we can promise that we won't modify it at this time either.

Therefore, it's not about exclusive access to atomics. It's about exclusive-writer or multiple-reader access. Kind of like a RwLock or RefCell. The function is safe to use only if either of this is true:

  1. There is exactly one writer (our thread) and no readers (nobody else accessing atomics).
  2. There are no writers (and we're just reading atomics).
ghost commented 7 years ago

@aturon

One very minor question: you seem to drop Atomic::new, instead having the user allocated an Owned to initialize. Any particular reason for that?

Good catch! This was not intentional. I'll add Atomic::new to the interface.

ghost commented 7 years ago

@schets Can we merge, or would you like more time to review?

schets commented 7 years ago

It looks good, I have one idea - if we're providing the unprotected epoch function, would it also make sense to provide an unsafe re-pin function which leaves and re-enters the epoch without leaving the scope? The use case I imagine is where you

A situation I could think is where somebody wants to avoid repeatedly re-pinning a thread in a loop, and so encapsulates the entire loop in a pin, or possibly some destructor/cleaner like function. It would be great in situations where it's hard to maintain a state machine outside of the pinned context.

ghost commented 7 years ago

@schets That's a good point - in some cases repinning might be very handy.

Do you perhaps have a suggestion on how exactly would such a function be used? Here are some quick ideas:

epoch::pin(|scope| {
    // Idea 1:
    unsafe { epoch::repin(); }

    // Idea 2:
    unsafe { epoch::repin(scope); }

    // Idea 3:
    unsafe { scope.repin(); }
});

It's a bit unfortunate that we can call this function even inside an epoch::unprotected scope. We could solve the problem by having different types like PinScope and UnprotectedScope that implement trait Scope, but that is probably too much complexity for too little benefit.

I personally am not sure what's the right interface here. More concrete cases where this is needed would be illuminating.

If we don't figure out the interface now, we can mention this in the "Unresolved questions" section and decide after gaining some experience with the library, or when a concrete need for this feature comes up.

arthurprs commented 7 years ago

I suggest postponing that for now.

ghost commented 7 years ago

Added the problem of re-pinning to unresolved questions, since it's not blocking the current proposal and we can figure it out later.

This RFC is being merged. Great work, everyone!

This week I'll publish a new RFC on epoch-based garbage collection, containing a comprehensive overview of everything related to it. This is a wide design space requiring lots of tradeoffs, and solutions to some problems are not obvious at all.