Closed ghost closed 7 years ago
I'm still thinking about all this, but:
2.1 - Provide an unsafe variant that takes memory ordering (cas_weak, ...) and a safe one that's always seqcst (cas_weak_seq_cst, ...).
I'm a bit hesitant on this but I thought I'd share anyway. We could use the type system to provide type-safe nullability and taggabiliy for zero-cost safety, something along the lines of:
trait Nullability; trait Taggabiliy;
these traits will provide some methods that will be implemented by
struct Nullable; struct NonNullable; struct Taggeable; struct NonTaggeable;
And ultimately be used by the structs Ref, Atomic and their methods.
struct Ref<'a, T, X: Nullability, Z: Taggabiliy = NonTaggeable> {.. } struct Atomic<T, Z: Taggabiliy = NonTaggeable> { .. }
This is not a full description but I'm hopping to start a conversation at least.
I've seem the type system being used for similar things with good success, like:
As discussed in https://github.com/stjepang/coco/pull/2, I think:
crossbeam-epoch
and epoch-based memory reclamation scheme in general resolve the races between read/writes and deallocations of a location, but not the races among read/writes. That is the burden of the user of crossbeam-epoch
.unsafe
.unsafe
.unsafe
.unsafe
. Yes, I think the dereferencing is where the races are caused, and it should be manually inspected by the users, IMO.I've not had time to digest these APIs in detail, however I did want to get some clarification about one assertion made in the RFC (I'm not that familiar with crossbeam's API, so I'm looking into it as I go):
By using relaxed orderings we can create a data race within safe code. This is unsound, as Rust must not allow data races in safe code.
The data-race here is specifically because the load
API offers an Option<Shared<T>>
, right? That is, because Shared<T>
implements Deref
, we are basically asserting that if you get your hands on one, it is safe to deref -- but with the Relaxed
store, there is no reason for that to be true. (In contrast, the AtomicPtr
API in the standard library gives yo back a *mut T
, meaning you have to decide for yourself if it is safe to deref.)
@nikomatsakis Exactly. Dereferencing is the crucial operation here.
We can say that dereferencing is simply unsafe and call it a day. Or, as @jeehoonkang suggests, we can simply mark all load/store/swap/cas/cas_weak methods as unsafe. But that sounds like capitulation to me. :)
I believe we can design an API that is both safe and performant for 95% use cases, and would like to push for that. This is Rust's overall philosophy anyway: let's make most of the code easy and safe, and apply unsafe blocks in the few special cases.
2.1 - Provide an unsafe variant that takes memory ordering (cas_weak, ...) and a safe one that's always seqcst (cas_weak_seq_cst, ...).
But Acquire
/Release
/AcqRel
are safe as well. Generally speaking, I think memory orderings for all atomic operations fall into three categories:
Relaxed
ordering. This is unsafe.Acquire
(for load
), Release
(for store
), AcqRel
(for swap
, cas
, and cas_weak
). This is safe.SeqCst
ordering (just like 2nd category, but with guaranteed global total order). This is also safe.When writing concurrent data structures, I believe that the 2nd and 3rd are equally commonly needed, while the 1st is pretty rare.
For example, to allow full flexibility for store
operation, we could have the following methods:
store_release
store_seq_cst
store_box_release
store_box_seq_cst
store_raw
(unsafe)Unsafe methods store_relaxed
and store_box_relaxed
are not very compelling. Since they're already unsafe (and I believe rare), you can just as well use store_raw
.
We could use the type system to provide type-safe nullability and taggabiliy for zero-cost safety, something along the lines of:
This is certainly an interesting approach, although I haven't come up with a satisfying solution in this direction yet.
But Acquire/Release/AcqRel are safe as well.
Sure, but the idea is to limit the number of methods.
I'm not confident it's possible o provide an actual safe interface. It's gonna be really tricky to write a concurrent data structure without unsafe somewhere anyway.
A few more ideas:
Another possibility is to let methods accept an argument that tells whether they are sequentially consistent or not. For example:
// AcqRel when false, SeqCst when true
fn swap<'a>(&self, new: Ptr<'a, T>, seq_cst: bool) -> Ptr<'a, T>;
// Just to avoid confusion: `Ptr` holds a (possibly null) pointer and a tag encoded within its unused bits.
Then, with swap_raw
you have full control over precise ordering when you need it:
fn swap_raw<'a>(&self, new: *mut T, tag: usize, order: Ordering) -> Ptr<'a, T>;
Perhaps even macros could help to allow syntax like:
cas!(head, old, new); // SeqCst by default
cas!(head, old, new, AcqRel);
cas!(head, old, Box::new(Node::new()), SeqCst);
@stjepang oops! I made a mistake in writing the previous comment. I meant to mark dereferences as unsafe
. Sorry for confusion.
I believe we can design an API that is both safe and performant for 95% use cases, and would like to push for that. This is Rust's overall philosophy anyway: let's make most of the code easy and safe, and apply unsafe blocks in the few special cases.
I don't agree with this characterization. In Crossbeam master, Chase-Lev dequeue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/chase_lev.rs#L195), MS queue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/ms_queue.rs#L89), segmented queue (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/seg_queue.rs#L63), and Treiber stack (https://github.com/crossbeam-rs/crossbeam/blob/master/src/sync/treiber_stack.rs#L35) all have at least one relaxed access. In general, almost every data structure implemented in C/C++11 memory model I know of is using relaxed accesses.
Also, I think crossbeam-epoch
is essentially unsafe
: it is well-known that non-blocking data structures are really hard to implement, and the difficulty is beyond what Rust's type system can give to us. Yes, I am proposing to "give up", but I think the burden of ensuring race-freedom (except for the races with deallocation) in the data structures is at the user's hand, not ours.
I spend some time with the type system to check if (https://github.com/crossbeam-rs/rfcs/pull/2#issuecomment-298902593) could work and it appears so https://gist.github.com/arthurprs/3da8f45f99c6acb663cc09c49888147d
I implemented only a bit of the coco interface and it needs some typedefs to make it ergonomic.
@stjepang what about a builder-like pattern. Give acquire-release by default, but allow opt-in to other orderings:
ptr.seqcst().swap(...);
unsafe { ptr.relaxed().swap(...); }
I have a comment on the tagged pointers. This may be relevant in a far future, but...
Suppose the consume
ordering will be introduced in Rust in the future (possibly very far from now). Then the tagged pointers cannot be used for the consume
ordering, since they are implemented with AtomicUSize
. Note that the dependency for the consume
ordering is carried only to the pointer types, but not integer types such as AtomicUSize
.
This may not be an issue for Rust or Crossbeam, since consume
is very premature and the conflict between consume
and pointer tagging is recognized by the C/C++ standard committee and the people are discussing how solve it. I am just writing down this comment for noting that consume
may be relevant to the design choices we are making here..
@jeehoonkang Are you sure you aren't mixed up on the problems with consume ordering? Consume ordering can depend on non-pointer loads, take this code:
int mt_values[];
std::atomic<int> mt_index;
{
int which_index = mt_index.load(std::memory_order_consume);
int value = mt_values[mt_index]; // address dependency on which_index
}
There are issues like this where the compiler optimizes the dependency away, but I can't find any issues about tagged pointers, working only with integers, or anything in the standard which indicates consume only applies to pointer types. The standard is worded in such a way to imply that any value type loaded with consume can order loads in a dependency chain:
12: An evaluation A is dependency-ordered before an evaluation B if (12.1) — A performs a release operation on an atomic object M, and, in another thread, B performs a consume operation on M and reads a value written by any side effect in the release sequence headed by A, or (12.2) — for some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
@schets Sorry, I should have cited the document I just implied: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r3.pdf http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0462r1.pdf The standard committee agrees that the current definition of consume
is unusable, and is preparing a fix to that. The document says:
"Another conspicuous change is that this proposal does not require dependencies to be carried by integers, but instead only by pointers. Note that versions v4.2 and later of the Linux kernel avoid carrying dependencies through integer variables [15]. At first glance, it might seem that restricted carrying of dependencies through intptr_t
and uintptr_t
is needed to tag pointers, but such tagging is not supported by the current C++ standard. This document therefore completely prohibits carrying dependencies through integers."
That isn't part of the standard nor is it going to be in c++17, so I don't think we should worry about it. I find it absurd that they would prevent bit manipulation of pointers from being carried as a dependency, and expect the proposal will either allow for such behavior or a standardization proposal for that will be submitted later.
It might be worth thinking about if the pointer tagging scheme would cause the compiler to do anything funky, although LLVM and any other respectable c++ compiler could never do that or else they would break every single program out there. If this is ever expected to support architectures that don't use standard pointers (near/far scheme, 32 bit chars, something exotic?) we could think about that as well.
@schets yeah, the standard committee recognizes pointer tagging is essential, and surely they will submit another memorandum to deal with it. I am leaving a comment just to note that there is a discussion on it. I agree with you that the standard thing is not an important issue for now, and it is more urgent to see if the existing compilers will work fine.
Getting rid of the tag argument in functions is a great change, I can get behind this interface.
It seems to me that optional pointer tagging is a good use case for a type parameter like I showed up-thread. Although in that example I included nulling as well, which I think may not carry it's weight.
I've updated the RFC. This is now a real proposal and not a draft anymore. :)
@arthurprs Your idea of providing optional nullability and taggability is interesting. However, I think the right choice here is: Always support nullability and taggability. Having other combinations available is just not worth the hassle, IMO.
@nikomatsakis Builder-like pattern at first looks like a potential solution, but feels a tad too odd, since it's inconsistent with existing conventions (passing ordering as an argument). But more importantly...
@jeehoonkang I think it's best to do what you proposed long time ago: just mark deferencing as unsafe and call it a day. An API that allows safe dereferencing is too difficult to design and probably still too impractical in the end. So I gave up trying to pursue complete safety. :)
Having other combinations available is just not worth the hassle, IMO.
It don't need to be a hassle. From what I learned in that example, tagging as a type parameter in the api you proposed now is really simple and will make it zero cost regarding tags.
I'll write a sample how it could look like.
Hey, thanks for the RFC! It's great to see crossbeam
moving forward to better foundations!
Consider a node in a skiplist. It consists of: key, value, tower. The tower is an array of atomic pointers. To save a level of indirection, it is wise to lay out the entire tower inside the node. This means that nodes are dynamically sized.
A somewhat space-wasting solution is to have a tower
array of constant size MAX_LEVELS
with some levels empty, or None
. It's not too bad, since afaik you can prove that at most (lg n) levels are necessary with high probability.
Yes, I am proposing to "give up", but I think the burden of ensuring race-freedom (except for the races with deallocation) in the data structures is at the user's hand, not ours.
Why this defeatism? :D It generally happens in Rust that internals of high-performance libraries would be unsafe, and so is the case for crossbeam-epoch
, which isn't really meant to be widely used anyway. But this does not prevent us from providing safe and easy-to-use structures built on top of epoch, which is really the end goal.
There's a bit of an issue with tagged pointers. To be taggable, a pointer has to have alignment of at least 2. How would the API behave when a pointer only has alignment 1? Specifically, what does tag(&self)
return in such a case? I see two options. One is to make it return Option<usize>
, which would be slightly burdensome. The other is to just agree that an untaggable pointer always has tag 0 and mention that in the documentation. ~The same applies to with_tag(self, tag: usize)
. It could simply ignore the tag for untaggable pointers, but again, we have to document that, or return Option<Self>
.~ Coco just panics with tag too large
and that's fine, but it does accept tag 0
. Might be worth documenting.
Additionally, I believe it would be worth it to provide both compare-and-set and compare-and-swap, changing the naming of the appropriate methods in accordance with crossbeam-rs/crossbeam#116 to eliminate confusion about what cas
stands for.
Another related issue is crossbeam-rs/crossbeam#113 and the proposal seems to solve it :)
I noticed something more worrying. Since there is no store_owned
, storing a new value that we just constructed from e.g. a Box
requires using Owned::into_shared
, and that requires a pin, even though it's unnecessary, since we're not getting a Shared
back! This incurs non-negligible overhead, so I propose adding store_owned
back.
Thank you for finishing up the RFC. IMO this is a huge achievement! I just have two comments.
I agree with @Vtec234 in the naming of cas
. I think it is convenient to follow the API of Rust std: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicPtr.html. I made an PR on this (https://github.com/crossbeam-rs/crossbeam/pull/128/files).
"By using relaxed orderings we can create a data race within safe code. This is unsound, as Rust must not allow data races in safe code.": In the world of C/C++, data race on relaxed accesses is safe. Data race on nonatomic accesses is unsafe. It is possible that Rust will just reuse the C/C++ concurrency memory model (which is not clearly decided yet; see more details in https://github.com/nikomatsakis/rust-memory-model). Here the problem is, by using relaxed ordering you may read uninitialized memory, which is clearly a severe problem. More concretely, to quote https://github.com/crossbeam-rs/crossbeam/issues/137, "The dereference of a Shared<T>
, may race with the writer of the shared data of type T
. For example, suppose a Box<T>
is written to Atomic<T>
with store_ref(Relaxed)
, and later another thread read the Atomic<T>
with load(Relaxed)
. Then the Box<T>
itself may not have been synchronized, and the dereference of the loaded Shared<T>
may invoke undefined behavior."
@stjepang This is close to what I had in mind, https://gist.github.com/arthurprs/3da8f45f99c6acb663cc09c49888147d
The important bit is
pub struct Atomic<T, Z: Taggabiliy=NonTaggeable> {
}
pub struct Shared<'p, T: 'p, Z: Taggabiliy=NonTaggeable> {
}
pub mod tagged {
pub type Atomic<T> = super::Atomic<T, super::Taggeable>;
pub type Shared<'a, T> = super::Shared<'a, T, super::Taggeable>;
}
Which makes tagging free for modules that don't need them use crossbeam::atomic::{Atomic, ..}
and users opt-in by importing stuff from tagged submodule use crossbeam::atomic::tagged::{Atomic, ...}
So no hassle, no duplicated code and don't pay for what you don't use.
I'll write a sample how it could look like.
That'd be helpful! So, the question is "is it worth splitting taggable and non-taggable atomics at all?" I think not, since the cost of ptr & MASK
is really negligible in both code size and runtime cost. :) The other issue is that tagged atomics force proper alignment, but is that a real concern?
By the way, after moving MsQueue
and TreiberStack
to taggable atomics I didn't notice any difference in performance.
A somewhat space-wasting solution is to have a
tower
array of constant sizeMAX_LEVELS
with some levels empty, orNone
. It's not too bad, since afaik you can prove that at most (lg n) levels are necessary with high probability.
The average tower height is 1 + 1/2 + 1/4 + 1/8 + ... = 2
. So, if all nodes have a tower of height 32, while the average is 2, that's a lot of wasted space. :) That also contributes to worse cache locality, therefore making the perfomance substantially worse (I did the benchmarking).
There's a bit of an issue with tagged pointers. To be taggable, a pointer has to have alignment of at least 2. How would the API behave when a pointer only has alignment 1? Specifically, what does
tag(&self)
return in such a case?
Well, if a pointer has alignment of 1, the only value a tag can have is zero. The maximum value a tag can have is alignment - 1
. Method with_tag
should panic if the tag is too large. We could return an optional value to indicate whether tagging succeeded, but would anyone ever use that? Everyone would just call .unwrap()
right after tagging. :)
Additionally, I believe it would be worth it to provide both compare-and-set and compare-and-swap, changing the naming of the appropriate methods in accordance with crossbeam-rs/crossbeam#116 to eliminate confusion about what cas stands for.
The difference between those two is that compare-and-swap does more (it performs a load on failure) and requires a pin/guard. Is there a real scenario where compare-and-set is more desirable? Yes, compare-and-set is more desirable when the thread is not pinned, but why would someone actually want to do CAS on an Atomic
when it isn't pinned? A concrete example here would be illuminating.
Compare-and-swap is a very common operation, so I'd strongly prefer having the name short. Especially because it takes four arguments. Arguably, the weak variant is more commonly used than the strong variant, so if we go with longer names, this might be the most common use of CAS in practice:
match self.head.compare_and_swap_weak_owned(head, node, AcqRel, pin) {
Ok(_) => break,
Err((h, n)) => {
head = h;
node = n;
}
}
Alternatives to compare_and_set
and compare_and_swap
might be cmp_set
and cmp_swap
, or maybe cas
and cmpxchg
. The proposed cas
is more similar to compare_exchange
rather than compare_and_swap
in libstd anyway. BTW, I find it silly that compare-and-set is not named compare-and-store instead. :)
Also, the current proposed cas
is not really compare-and-swap nor compare-and-set. It swaps on failure, but doesn't swap on success. It's also not compare-exchange because it returns the new value on success. Naming is hard.
To summarize, I personally don't think we need a split between compare-and-set and compare-and-swap. But if you do, I'd like to ask for the following:
I noticed something more worrying. Since there is no
store_owned
, storing a new value that we just constructed from e.g. aBox
requires usingOwned::into_shared
, and that requires a pin, even though it's unnecessary, since we're not getting aShared
back! This incurs non-negligible overhead, so I propose addingstore_owned
back.
This is another method for which I haven't yet found a compelling use case. If you do have any, let me know. :) But I'm not as opposed to this idea - adding store_owned
back sounds reasonable to me.
In the world of C/C++, data race on relaxed accesses is safe. Data race on nonatomic accesses is unsafe.
Oh, I guess my statement on data races was worded poorly. Should've been more clear on that accessing unsynchronized non-atomic data is the real problem. I'll try fixing it.
I personally prefer the longer names and compare-and-set, solely because it follows the naming scheme of libstd. What do you think of this aspect?
@arthurprs Thanks for writing this!
It's a really neat trick. I just tried running cargo doc --open
on that piece of code, but the added complexity stands out a bit. We can fix some rough edges with type aliases, but some will remain, e.g. signatures like:
fn load<'p>(&self, ord: Ordering, _: &'p Pin) -> Shared<'p, T, Z>;
fn store<'p>(&self, ord: Ordering, new: Shared<'p, T, Z>);
This is not too complex nor too much of a hassle, but still something that adds complexity. Let me try to summarize my stance. It's all a matter of tradeoffs.
Advantages:
Disadvantages:
Perhaps you'll disagree, but for me the disadvantage is greater than the sum of advantages. :) It'd be good to hear what others think as well. @jeehoonkang, @Vtec234 ?
@jeehoonkang
I personally prefer the longer names and compare-and-set, solely because it follows the naming scheme of libstd. What do you think of this aspect?
Well, cas
and cas_owned
do resemble compare_and_set
and compare_exchange
, but the differences are subtle:
// Returns current on success, or loads on failure.
fn compare_and_swap(
&self,
current: *mut T,
new: *mut T,
order: Ordering
) -> *mut T;
// Returns current on success, or loads on failure.
fn compare_exchange(
&self,
current: *mut T,
new: *mut T,
success: Ordering,
failure: Ordering
) -> Result<*mut T, *mut T>;
// Returns () on success, or loads on failure.
fn cas<'p>(
&self,
current: Shared<T>,
new: Shared<T>,
ord: Ordering,
_: &'p Pin,
) -> Result<(), Shared<'p, T>>;
// Returns new on success, or loads on failure.
fn cas_owned<'p>(
&self,
current: Shared<T>,
new: Owned<T>,
ord: Ordering,
_: &'p Pin,
) -> Result<Shared<'p, T>, (Shared<'p, T>, Owned<T>)>;
I'd generally lean towards established conventions, but in this case we have small differences so conflating methods in Crossbeam with those in libstd under the same names is not as appealing. Also, data structures in Crossbeam will heavily use CAS operations, and things like self.head.compare_and_swap_weak_owned(head, node, AcqRel, pin)
will be common and look terrible. :)
As a side note, the silent voice in my head doesn't say "compare-and-swap this pointer with that one" but rather "install this pointer if the old is that one". Or just "cas this pointer". It's a common method so I like it terse.
Another issue is that I believe we don't need both compare-and-swap and compare-and-set (explanation at the end of this comment), so being explicit about what particular flavor of CAS is in question is not as important.
@Vtec234 Just to elaborate a bit on your comment regarding store_owned
...
I feel okay about the current API, but not great. Some odd aspects stand out, e.g. you can store a value without a pin, but you can't CAS without a pin. Also, load_raw
feels out of place.
On a more higher level, I think Atomic
should:
The current API succeeds on all points except the last one. Regarding that one, I'd like to avoid having methods like load_raw
, store_raw
, swap_raw
, cas_raw
, cas_weak_raw
. That adds a lot of new methods. And what about cases where you want to do swap
with a raw pointer, but get back a Shared
? Or CAS where the previous value is a raw pointer, but the new is Shared
?
Instead of creating special methods aimed at raw pointers, I think converting raw pointers to Shared
is the way to go, especially in presence of pointer tagging. For example:
let a: *mut T = _;
let b: Shared<T> = _;
// Bad:
let c: *mut T = atomic.cas_raw((a, 1), (b.as_raw(), b.tag()), SeqCst).0;
// Better:
let a = Shared::from_raw(a);
let c: *mut T = atomic.cas(a.with_tag(1), b, SeqCst, pin).as_raw();
The only *_raw
method in the current proposal is load_raw
. And it is odd. So why does it exist? It exists because loading atomics without pinning in destructors (Drop
impls) is very useful. But this begs the question: do destructors need raw pointers or do they need to avoid pinning? Of course, the latter is really true.
All concurrent data structures go through three phases:
new
and from_iter
).drop
).During construction and destruction we have a &mut
to the whole data structure and would like to avoid pinning the current thread. At the moment, Crossbeam pins the current thread during construction of MsQueue
and during destruction of Deque
.
Perhaps it would make sense to create a "fake" pin/guard during construction and destruction in order to pass it to methods on Atomic
that require it. Perhaps we shouldn't think of pin/guard as "proof that the current thread is pinned", but rather "proof that nothing will be destroyed in this scope".
When constructing/destructing we know that nothing will be concurrently destroyed under our feet. Perhaps it'd make sense to instead provide an unsafe function that gives a pin/guard, but rather than pinning the current thread just asserts we have ownership of the whole data structure? Then the destructor for Stack
might look like this:
impl<T> Drop for Stack<T> {
fn drop(&mut self) {
let pin = unsafe { &epoch::assert_ownership() };
let mut curr = self.head.load(Relaxed, pin);
while !curr.is_null() {
unsafe {
let next = curr.deref().next.load(Relaxed, pin);
drop(Box::from_raw(curr.as_raw()));
curr = next;
}
}
}
}
If we introduce that function, there'll be no need for store_owned
, load_raw
, and compare-and-set. It'll also allow us to freely use Shared<T>
instead of (*mut T, usize)
. Thoughts?
The average tower height is 1 + 1/2 + 1/4 + 1/8 + ... = 2. So, if all nodes have a tower of height 32, while the average is 2, that's a lot of wasted space.
~Not that it matters at all for this discussion, but just for factual correctness, I'm pretty sure that this is incorrect. That infinite sum is the probability of the height falling in the range [0, +infty]. If you ignore height 0 (impossible), it's correctly probability 1. The actual average height is (lg n), so for 65536 elements it would be height 16, which is sometimes used as the constant limit. With that limit the wasted space would be n(16-lg(n)), which.. I agree is still pretty bad.~ This is wrong, please ignore.
fn load<'p>(&self, ord: Ordering, _: &'p Pin) -> Shared<'p, T, Z>;
fn store<'p>(&self, ord: Ordering, new: Shared<'p, T, Z>);
It's an interesting idea. On the other hand, the cost of untagging is negligible compared to all the sync operations that structures do. I was originally opposed to merging Atomic
and TaggedAtomic
since the resulting API would be really clunky with tags everywhere, but the new design separates them out neatly. Adding trait hackery to separate types would only make it more confusing.
Users can have pointers to misalinged data.
This is one concern I'm not sure the importance of. I've never seen data being purposefully misaligned for any reason, but if there are exotic cases where it matters, that might be worth considering. If there aren't, I'd agree with @stjepang that the current form, i.e. having just one form of pointer is preferable.
impl<'p, T> Shared<'p, T> {
pub unsafe fn deref(&self) -> &'p T;
pub unsafe fn as_ref(&self) -> Option<&'p T>;
}
What does deref
do? If it's just as_ref().unwrap()
, then it's really unnecessary to have it.
I split the more important point into a post below. ~EDIT: Needs more thought, will post later.~
Not that it matters at all for this discussion, but just for factual correctness, I'm pretty sure that this is incorrect.
Oh, I think we're just talking about different things.
I'm saying: sum(tower_heights) / len(tower_heights) = 2
.
You're saying: max(tower_heights) = log(n)
.
Anyways, total wasted (unused) memory is (MAX_LEVELS - 2) * len(tower_heights) * size_of::<usize>()
.
This is one concern I'm not sure the importance of. I've never seen data being purposefully misaligned for any reason, but if there are exotic cases where it matters, that might be worth considering.
I'm not aware of any either. If there are legitimate situations for misaligned data, it sounds like one is doing something so low-level that they'd probably want to avoid Atomic
s entirely anyway.
What does
deref
do? If it's justas_ref().unwrap()
, then it's really unnecessary to have it.
It is similar to as_ref().unwrap()
, but it is also zero-cost: it doesn't panic when the pointer is null.
After digesting this a little, I've come to the conclusion that it would be best to make Owned
nullable and remove Shared::null
, reinstating the status of Shared
as proving the existence of a pinned epoch.
Here is more or less why:
Atomic
without pinning the epoch. Chase-Lev uses this, albeit wrongly. It could be just Atomic::new
. Nevertheless, the possibility existed.store_owned
. But notice that since Owned
is not nullable, we have to use store
with a Shared
to pass a null. So to set Some
, we use one method, to set None
we use another. This does not seem right.Owned
nullable, Shared::null
wouldn't have to exist - Owned::null.into_shared
would achieve the same thing, but also require a pin, restoring the status of Shared
as a pin guarantee.Pin
arguments where Shared
is also an argument.Since Shared is now nullable, it can be created without a pin by calling Shared::null(). In other words, the existence of a Shared used to be proof that the current thread is pinned. But now it isn't anymore, so methods like cas_and_ref must take a pin/guard.
Shared::null
being problematic is that the pin argument on Owned::into_shared
is unnecessary! Shared does not assert a pin existing, so it should be possible to create it from anything, not just null. The function should then be Owned::into_shared(self) -> Shared<T>
.. but oh wait, lifetimes won't allow it. To me this seems like a strong indicator that we're going in the wrong direction with allowing unpinned Shared
s.Perhaps it would make sense to create a "fake" pin/guard during construction and destruction in order to pass it to methods on Atomic that require it. Perhaps we shouldn't think of pin/guard as "proof that the current thread is pinned", but rather "proof that nothing will be destroyed in this scope".
This is an interesting idea, but I find it really confusing. I believe that we should stick to the concepts begin epoch-based reclamation and let our API reflect them as best as possible. A pin asserts a pinned epoch, and a shared pointer is contingent on that pin. The usage of lifetimes in this case really nicely enforces this idea and we should not diverge from it. Allowing a non-pinned Shared
essentialy invalidates the whole purpose of the type.
The "fake" pin would also require an internal flag in the Pin
to notify it not to unpin on destruction, but that's probably simple enough to do, so it's not really an argument against it.
Support all commonly used atomic operations. Aim to not go overboard with too many methods. Aid in correct use of epochs by leveraging lifetimes and pins/guards. Allow operations on raw pointers and/or operations that don't require a pin/guard.
I agree wholeheartedly with these goals and it seems to me that so does the above reasoning.
The changes to the API would be:
impl<T> Atomic<T> {
+ pub fn store_owned(&self, new: Owned<T>, ord: Ordering);
- pub fn swap<'p>(&self, new: Shared<T>, ord: Ordering, _: &'p Pin) -> Shared<'p, T>;
+ pub fn swap<'p>(&self, new: Shared<'p, T>, ord: Ordering) -> Shared<'p, T>;
- pub fn cas<'p>(
- &self,
- current: Shared<T>,
- new: Shared<T>,
- ord: Ordering,
- _: &'p Pin,
- ) -> Result<(), Shared<'p, T>>;
+ pub fn cas<'p>(
+ &self,
+ current: Shared<T>,
+ new: Shared<'p, T>,
+ ord: Ordering,
+ ) -> Result<(), Shared<'p, T>>;
... same for other CASs
}
// having to get rid of these is slightly annoying
// an alternative would be to have Option<Owned> arguments, but as I understand
// we do not want Options everywhere
- impl<T> Deref for Owned<T> { ... }
- impl<T> DerefMut for Owned<T> { ... }
impl<T> Owned<T> {
+ pub fn null() -> Self;
+ pub fn is_null(&self) -> bool;
+ pub fn as_ref(&self) -> Option<&T>;
}
What do you think?
I very much like @stjepang 's idea on faking guard. It will greatly simplify the interface. FWIW, it is already used in https://github.com/crossbeam-rs/crossbeam/blob/master/src/epoch/participants.rs#L57
On the naming convention, I still strongly feel it would be better to follow the libstd convention. Personally I am not particularly dissatisfied with long names, e.g. compare_and_swap_weak_owned
. Also, the conflation of the library APIs is currently in an acceptable level. We can document the differences among the methods and a guideline to choose the right one for the user. Anyway the file will be less than 300 LoC (i.e. https://github.com/jeehoonkang/crossbeam/blob/cfb4d88387659117281ac8ab4da806ce4920f542/src/epoch/atomic.rs).
For the naming convention, I don't think we can come closer to each other in asynchronous conversation in GitHub issues. I will try my best to keep online in IRC (irc.mozilla.org #crossbeam).. :)
Okay, this time I'd like to take a step back and perhaps try with new terminology. Perhaps a fresh story might sound more coherent.
Forget everything you know about Crossbeam...
So, epoch-based GC has this Atomic<T>
type. It is an atomic pointer to a heap-allocated T
. Loading an Atomic<T>
returns a pointer Ptr<'scope, T>
that can be dereferenced (if it isn't null). Example:
let a = Atomic::new(7); // Atomic pointer to a heap-allocated number 7.
let p = a.load(SeqCst);
But this doesn't work just yet! Epoch-based GC is all about scopes. When working with heap-allocated objects, we don't want other threads to concurrently destroy them under our feet. Let's say we'd like to print the loaded number:
let p = a.load(SeqCst);
println!("{}", *p.as_ref().unwrap());
Between the load
and println
we really don't want some other thread to free the allocated number 7. There's a way to say to the GC: "please don't destroy any garbage created during the life of this scope before it ends":
epoch::pin(|scope| {
let p = a.load(Seqcst, scope);
println!("{}", p.as_ref().unwrap());
})
Now, if another thread decides to free the heap-allocated number at a time between the load
and println
, freeing will be simply deferred. The garbage collector keeps track of such scopes and will destroy it later at a safe time.
All Ptr
s have a lifetime in their type: Ptr<'scope, T>
, which indicates the scope in which they're usable.
We can summarize the lives of Atomic
s and Ptr
s like this:
Atomic
s are long-lived and usually live on the heap.Ptr
s are short-lived and live on the stack within a scope.The sample code still doesn't compile because as_ref()
is unsafe. Dereferencing is safe only if you use memory orderings to properly synchronize writes/reads to/from heap-allocted objects. This will compile:
epoch::pin(|scope| {
let p = a.load(Seqcst, scope);
println!("{}", unsafe { p.as_ref() }.unwrap());
});
If you're wondering why we're calling unwrap()
here, that is because the pointer might be null. Conversion to a reference succeeds only if it is not null.
How do we deallocate heap-allocated objects? This is how:
epoch::pin(|scope| {
let p = a.load(SeqCst, scope);
if !p.is_null() {
// Try "stealing" the object from the atomic pointer.
if a.compare_and_swap(p, Ptr::null(), SeqCst, scope).is_ok() {
// Success! Add the object to the garbage collector.
unsafe { epoch::defer_free(p, scope) };
}
}
})
Function defer_free
is unsafe because we must guarantee that the object is now unreachable from atomic pointers. Since we "stole" the object by compare-and-swapping the pointer to something else (null), we know that other threads can't obtain it again.
Some threads might still be holding a Ptr
to it in their scopes, but as soon as their scopes end, no threads will be able to have a Ptr
to it again.
Scopes created by epoch::pin
must be short-lived, otherwise the GC might get stalled for a long time and keep growing too much garbage.
Sometimes, however, we'd like to have longer-lived scopes in which we know our thread is the only one accessing atomics. This is true e.g. when destructing a big data structure, or when constructing it from a long iterator. In such cases we don't need to be overprotective because there is no fear of other threads concurrently destroying objects. Then we can do:
unsafe {
epoch::unprotected(|scope| {
// Destruct all nodes in a Treiber stack.
let mut curr = head.load(Relaxed, scope);
while !curr.is_null() {
let next = curr.deref().next.load(Relaxed, scope);
drop(Box::from_raw(curr));
curr = next;
}
})
}
Function unprotected
is unsafe because we must promise that no other thread is accessing our Atomic
s and objects.
Just like with the safe pin
function, unprotected use of atomics is enclosed within a scope so that pointers created by it don't leak out or get mixed with pointers from other scopes.
The only remaining question is: how do we store new objects into Atomic
s? For that, there is type Owned<T>
, which is almost identical to Box<T>
. In fact, Owned<T>
can even be constructed directly from Box<T>
.
We can convert an Owned<T>
into Ptr<'scope, T>
:
let mut owned = Owned::new(Node {
value: value,
next: Atomic::null(),
});
epoch::pin(|scope| {
let node = owned.into_ptr(scope);
// ...
})
Or, we can use compare-and-swap to install new objects into atomic pointers. Here's how one might push a new owned
node onto a stack:
epoch::pin(|scope| {
let mut head = self.head.load(Acquire, scope);
loop {
owned.next.store(head, Relaxed);
match self.head.compare_and_swap_weak_owned(head, owned, AcqRel, scope) {
Ok(_) => break,
Err((h, o)) => {
head = h;
owned = o;
}
}
}
})
That's all! How does it sound this time?
EDIT:
@Vtec234 suggested renaming epoch::unprotected
to epoch::exclusive
. I like that.
I think this is solid, don't mind my crazy type system suggestions.
:+1: for pin::exclusive
and stdlib like (long) names.
All right, I've updated the RFC with the recent changes. Thanks everyone for the feedback and your inputs! We're getting very close to finalization, but a lot has changed recently so I'd like to let everything sink in a bit, at least for a few days.
It seems that at this point only bikeshedding remains: Should epoch::unprotected
be renamed to epoch::exclusive
? Or perhaps something else?
I'll try to summarize what everyone thinks on that:
epoch::exclusive
is more explanatory, since it asserts that we have exclusive access to the data structure.exclusive
doesn't capture the fact that actually there can be a concurrent pin or exclusive. He prefers unprotected
, without_pin
, or unsafe_pin
. In other words, exclusive
and pin
don't sound like the two opposites. But he will not object to exclusive
and doesn't have a strong opinion.pin::exclusive
.epoch::exclusive
because it evokes what we're asserting (that we have exclusive access) rather than what is happening in the scope (unprotected use of atomics).Unless a better suggestion comes along or someone argues against it, it'll be renamed to epoch::exclusive
.
@aturon You don't have to read the lengthy discussion, but if you could skim through the RFC, that'd be super appreciated! After all, you designed the original interface and have lots of experience with API design in Rust, so your opinion would be very valuable here. :)
One could suggest epoch::local()
as well, but I'm ok with epoch::exclusive()
Here's an argument against naming it pin::exclusive
...
Yes, the function is useful in situations where we know that our thread is the only one accessing atomics (exclusively). But, it is also useful in situations where we simply know that no thread (including our own) is modifying atomics.
An example can be found in the implementation of deque in Coco. In method push
we need to load the pointer to the buffer. Pinning is unnecessary because we know that our thread is the only one that is able to modify the pointer, and we can promise that we won't modify it at this time either.
Therefore, it's not about exclusive access to atomics. It's about exclusive-writer or multiple-reader access. Kind of like a RwLock
or RefCell
. The function is safe to use only if either of this is true:
@aturon
One very minor question: you seem to drop
Atomic::new
, instead having the user allocated anOwned
to initialize. Any particular reason for that?
Good catch! This was not intentional. I'll add Atomic::new
to the interface.
@schets Can we merge, or would you like more time to review?
It looks good, I have one idea - if we're providing the unprotected epoch function, would it also make sense to provide an unsafe re-pin function which leaves and re-enters the epoch without leaving the scope? The use case I imagine is where you
A situation I could think is where somebody wants to avoid repeatedly re-pinning a thread in a loop, and so encapsulates the entire loop in a pin, or possibly some destructor/cleaner like function. It would be great in situations where it's hard to maintain a state machine outside of the pinned context.
@schets That's a good point - in some cases repinning might be very handy.
Do you perhaps have a suggestion on how exactly would such a function be used? Here are some quick ideas:
epoch::pin(|scope| {
// Idea 1:
unsafe { epoch::repin(); }
// Idea 2:
unsafe { epoch::repin(scope); }
// Idea 3:
unsafe { scope.repin(); }
});
It's a bit unfortunate that we can call this function even inside an epoch::unprotected
scope. We could solve the problem by having different types like PinScope
and UnprotectedScope
that implement trait Scope
, but that is probably too much complexity for too little benefit.
I personally am not sure what's the right interface here. More concrete cases where this is needed would be illuminating.
If we don't figure out the interface now, we can mention this in the "Unresolved questions" section and decide after gaining some experience with the library, or when a concrete need for this feature comes up.
I suggest postponing that for now.
Added the problem of re-pinning to unresolved questions, since it's not blocking the current proposal and we can figure it out later.
This RFC is being merged. Great work, everyone!
This week I'll publish a new RFC on epoch-based garbage collection, containing a comprehensive overview of everything related to it. This is a wide design space requiring lots of tradeoffs, and solutions to some problems are not obvious at all.
This is a proposal for a better
Atomic
API.Rendered
For reference, here are other similar APIs: