Closed ghost closed 6 years ago
I like this proposal very much! Thanks for writing it up. I have two comments:
I think EpochCell
will depend on crossbeam-epoch
, so it cannot live in crossbeam-utils
. Maybe we can discuss this in the "alternative" section.
One more thing on the SeqCst
memory ordering. Actually, as discussed in this paper, SeqCst
for load/store/rmw is surprisingly subtle! This subtlety may not arise in x86, but it may in weaker architectures e.g. ARM.
In any case, I fully agree with you that most programmers shouldn't need to bother with memory orderings. It is us who shoul ddo the dirty jobs 😉
@Amanieu Since you asked for more details on the proposed atomic primitives, I expanded the RFC.
My thoughts:
I'd really like Atomic<T>
, since it gives us things like Atomic<u16>
on stable Rust, in addition to accepting a wide spectruom of types like Atomic<[u8; 3]>
and Atomic<std::time::Duration>
. That's nice.
AtomicCell<T>
replaces AtomicOption<T>
and is a natural extension of Cell<T>
. This is actually the first thing users should reach for when they need an atomic counter. And it doesn't bother with memory orderings, providing safe defaults. This one is a must.
AtomicRefCell<T>
is also a natural extension of RefCell<T>
. There is already a concrete use case for it in Servo, so there should be no doubt about its usefulness.
HazardCell<T>
and EpochCell<T>
are very similar. These are the only primitives as powerful as AtomicReference
in Java (note that AtomicCell::get
works only with Copy
types). We should definitely have at least one of these, although I'm thinking we should have both.
I don't have a strong opinion on Adder<T>
and Accumulator<T>
. They seem neat, but one can probably survive with AtomicCell
just fine. These would be nice to have, but are not as commonly requested. Their lack is not a big pain point in the current state of atomics in Rust.
What do you think?
Does this RFC now provide sufficient motivation for creating crossbeam-atomic
? If so, let's merge.
LGTM
@Vtec234
To be honest, I'm not a huge fan of creating more crates. Although in this case it does look like this group of primitives needs a separate crate due to the cyclic dependency on epoch it would create, I think we should be wary of proliferating subcrates endlessly, as there already are quite a lot of them.
The motivation for creating crossbeam-atomic
is not avoiding the cyclic dependency on crossbeam-epoch
- that'd simply be an unfortunate side-effect of putting atomics into crossbeam-utils
.
Instead, the purpose of crossbeam-atomic
is to bundle together a coherent set of atomic primitives. I suggest looking at the java.util.concurrent.atomic
package - this is what I primarily had in mind when writing this RFC.
I hope that if a better way to organise things comes up, we can do a breaking release which moves things around, kind of like tokio.
It's interesting to draw parallels with tokio now that it is going through the reform. Here are a few quotes I like from the article you've linked:
In Crossbeam we're following a similar strategy. Library authors can pick individual crates they need, e.g. if you're writing a concurrent data structure you might want to depend only on crossbeam-epoch
. Application authors probably don't care about all the subcrates and would prefer to simply import crossbeam
, which should bundle together all our stable subcrates.
Our subcrates are of varying levels of stability, too. E.g. crossbeam-deque
and crossbeam-epoch
are fairly stable, while crossbeam-skiplist
definitely isn't. In my opinion, we should signal stability level by reexporting reasonably stable subcrates in crossbeam
and leaving everything else out.
This is an important benefit of having separate crates. Iterating experimental crates (like crossbeam-skiplist
) quickly without being constrained by stability guarantees of other crates makes versioning easier to manage.
Suppose both crossbeam-0.6
and crossbeam-0.7
depend on crossbeam-deque-0.2
- this way we minimize the amount of duplicated code throughout the ecosystem. Having subcrates also means there are smaller chances of encountering conflicts by using the same type from different versions of a crate. For example, crossbeam_0_6::deque::Deque
and crossbeam_0_7::deque::Deque
are the exact same type despite coming from different versions of crossbeam
.
All that said, I'd be onboard with consolidating crates and rethinking the whole organization once the dust settles down. I hope we can release crossbeam-0.4.0
that simply reexports other crates within the next two months. And maybe even reach version 1.0
later this year. Let's watch what the tokio/futures folks are doing and how they're dealing with similar problems.
The Atomic type is very low-level and it would even make sense to use it in epoch. But we can't, because... The EpochCell (and maybe other types proposed here, depending on their implementation) depend on epoch, so we have a cycle.
Sure, but maybe that's not an issue if we don't use Atomic
inside crossbeam-epoch
? I understand the concern, but just don't see it as a huge problem that would justify moving it into crossbeam-utils
. But there are tradeoffs both ways, for sure.
I thought a little bit more and it seems to me now that these types don't belong into a single crate. The reason is that they work on very different levels of abstraction. [...] I think the reason for this is that one type is very low-level, while the other has an entire garbage collector below it.
This is a good observation. Atomic
is indeed low-level, while all the others are high-level.
While I find it tempting to separate things into high-level and low-level categories, I actually prefer separating by what they do, or what they are used for.
In that sense, Atomic
and AtomicCell
are very very similar, the main difference being the balance of ergonomics/safety and performance/control. They seem too similar to be separated away. You don't agree?
Then, let's put types that use a particular garbage collection mechanism into crates that provide it - for example EpochCell into epoch and HazardCell into hazard when we do eventually make that crate.
This makes a lot of sense, but... Now I kind of regret naming these primitives that way because Epoch
and Hazard
in their names describe how they're internally implemented, and not what they're for or how they should be used. Rustaceans wanting AtomicReference
from Java shouldn't care or even know much about epochs and hazard pointers.
HazardCell
is a bit special in that it doesn't even need crossbeam-hazard
. I intend to implement a very simple special form of hazard pointers optimized for this type. It doesn't need a full blown GC. Maybe we should name it AtomicReference
or AtomicPointer
? Anyhow, HazardCell
is just an atomic pointer (Box
/Arc
/Weak
) you can load without the fear of a loaded value being concurrently destroyed.
EpochCell
is very similar, but rather than destroying removed objects immediately, it destroys objects lazily by inserting them into the GC. All you need to know about it is: objects get lazily destroyed and keeping guards alive for a long time keeps destruction from kicking in. That's all. Perhaps we should call this type LazyAtomicPointer
because removed objects are lazily destroyed?
In any case, given HazardCell
, I find the utility of EpochCell
kind of questionable. EpochCell
is more limited (requires T: 'static
) and comes with a few caveats (you mustn't hold a guard for too long). The main benefit is performance - updates (replace
, set
, compare_and_set
) are faster because of lazy destruction.
These two primitives are a lot like AtomicReference
in Java. I think of the the Epoch
and Hazard
parts in their names merely as implementation details rather than something that intrinsically binds them to crossbeam-epoch
or crossbeam-hazard
. What really differentiates them is when removed objects get destroyed - one primitive is eager and the other is lazy. That's all a user should care about.
The current proposal groups by abstraction, but doing it by implementation as described above instead would avoid unnecessary problems. Then, we can simply reexport these types together under mod atomic in the main crate, hence preserving a neat interface. Thoughts?
Yes, the main crate should have an atomic
module (or reexport crossbeam-atomic
as atomic
). But I'd still prefer to group crates by abstraction rather than implementation. I mean, EpochCell
will not be deeply ingrained into the epoch GC - it's just an abstraction that uses the epoch GC as an internal implementation detail.
Another, small problem is that naming of Atomic and Pointer conflicts with names already present in epoch. I would rename Pointer here to something like Boxed and epoch::Atomic to something like epoch::Guarded.
There was a recent long discussion on a similar problem here. The question was whether task::Context
should be renamed to TaskContext
since the name Context
might be used very often elsewhere. The conclusion was that the module name serves as a disambiguator already, and task::TaskContext
is just unnecessary long / stuttering.
The same applies in our situation. If you need to disambiguate these structs, using epoch::Atomic
is perfectly fine. I actually like how epoch::Atomic
, epoch::Shared
, and epoch::Owned
read - they're clear and not too long either.
This makes a lot of sense, but... Now I kind of regret naming these primitives that way because Epoch and Hazard in their names describe how they're internally implemented, and not what they're for or how they should be used. Rustaceans wanting AtomicReference from Java shouldn't care or even know much about epochs and hazard pointers.
These two primitives are a lot like AtomicReference in Java. I think of the the Epoch and Hazard parts in their names merely as implementation details rather than something that intrinsically binds them to crossbeam-epoch or crossbeam-hazard. What really differentiates them is when removed objects get destroyed - one primitive is eager and the other is lazy. That's all a user should care about.
I think naming types by their implementations is fine - what it is, really, is a way for people to remember a set of properties under a single name. Imagine that instead of calling a type LinkedList we called it SlowSearchButQuickInsertAndSplitList instead. Even though a user shouldn't have to care about implementation details, it doesn't hurt to have a high-level idea of why a given type has the properties it does.
In any case, the discussion of naming is tangential to this - division into crates and naming are mostly independent, and as you noted module prefixes are an effective way of disambiguating type names. My reasoning for why I think it would be better to implement these types within existing crates is not due to naming, but rather a hope that we can keep similar code together.
In that sense, Atomic and AtomicCell are very very similar, the main difference being the balance of ergonomics/safety and performance/control. They seem too similar to be separated away. You don't agree?
I agree that they are very similar in purpose, but not in implementation. In fact, in that aspect they differ substantially and use varying mechanisms. My hope is that we could group them in this way, e.g. by the underlying GC scheme, so that all code in such a crate can be reasoned about on the same basis. Moreover, this would avoid dependency problems like the one which arose here (although in this case it's not much of a worry, i think similar problems will arise with by-interface grouping).
We could still easy have an interface which expresses the usage clearly by grouping the reexports into a single module, effectively like the Java atomic
package.
I recall that in the case of structures, we decided to have one skiplist
crate which contains both a set and a map based on this implementation, and then also e.g. a hasharray
crate with both, as opposed to one map
crate and one set
crate, which would contain all implementations of their respective interfaces. It seems to me that this is a similar situation.
I agree with @stjepang that the number of crates is not a big deal.
As hopefully explained, I'm not opposed to new crates when they are clearly required, e.g. for new GC schemes, but rather to whether the Atomic
types should necessarily go into a single crate.
I think naming types by their implementations is fine - what it is, really, is a way for people to remember a set of properties under a single name. Imagine that instead of calling a type LinkedList we called it SlowSearchButQuickInsertAndSplitList instead. Even though a user shouldn't have to care about implementation details, it doesn't hurt to have a high-level idea of why a given type has the properties it does.
Good point. I'd like to pull back on my suggestion on the eager vs lazy naming idea. :) I'm not yet sure what to call our types, to be honest. Maybe it becomes clearer once we implement them.
In the end it probably depends on how closely the implementation details are tied to the data structure's properties as seen from the outside. Choosing a very specific name is a commitment: BTreeMap
will never be internally switched to a BST implementation, while slice::sort_unstable
could be switched to a different algorithm. In C++ we have std::map
and std::ordered_map
rather than std::tree_map
and std::hash_map
. But in Rust we generally tend to prefer naming things after implementation details.
I recall that in the case of structures, we decided to have one skiplist crate which contains both a set and a map based on this implementation, and then also e.g. a hasharray crate with both, as opposed to one map crate and one set crate, which would contain all implementations of their respective interfaces. It seems to me that this is a similar situation.
Yes, that is true.
I agree that they are very similar in purpose, but not in implementation. In fact, in that aspect they differ substantially and use varying mechanisms. My hope is that we could group them in this way, e.g. by the underlying GC scheme, so that all code in such a crate can be reasoned about on the same basis. Moreover, this would avoid dependency problems like the one which arose here (although in this case it's not much of a worry, i think similar problems will arise with by-interface grouping).
It seems this is actually the root of our disagreement. You're convinced that all these primitives are very different in implementation and that their implementations are tied to underlying GCs.
My train of thought goes like this:
Atomic
is a thin layer on top of std::sync::atomic::*
. It uses some trickery with spinlocks in order to support all T: Copy
types.
AtomicCell
is also a thin layer on top of std::sync::atomic::*
. It also uses spinlocks in order to support all types. Basically the only difference from Atomic
is the API.
AtomicRefCell
is to RefCell
what AtomicCell
is to Cell
. It's a useful and small single-file utility, so why not have it.
Adder
and Accumulator
are just more scalable versions of AtomicCell
.
HazardCell
is a layer on top of std::sync::atomic::AtomicPtr
. I like thinking of it as a concurrent sister of Cell<Rc<_>>
. Apart from concurrency, the main difference is that HazardCell
has a global array of references, while Rc
has a refcount for each individual object.
EpochCell
is the only one that actually uses a GC. It's similar to HazardCell
, except with different tradeoffs. The implementation is very simple - it's just a std::sync::atomic::AtomicPtr
that uses pinning to make accessing the value safe.
In the end, I can see why you're skeptical. I would be willing to hold off creating a new crate and prototype a set of these primitives so we can make a decision later.
Another crate you might be interested in: https://github.com/Amanieu/seqlock
From your description, Atomic
, Atomic(Ref)Cell
and Adder/Accumulator
are very similar objects that would fit well with other utilities in utils
, while HazardCell
and EpochCell
would fit in hazard
(not yet present) and epoch
. I understand that HazardCell
doesn't use the full-blown hazard pointers mechanism, but it has "global array of references", which is lightweight hazards. It seems to me that it would be easy to understand the code when it's alongside the heavier-weight hazard pointers, since they're a very similar idea, hence the grouping. Maybe others (@Amanieu, @jeehoonkang) have a preference for either option?
@Vtec234 I'm growing more and more sympathetic to your concerns about the complicated dependency graph that is emerging. Putting this RFC on hold.
@stjepang It's still a perfectly fine RFC, I think it'd be good to go if you just changed the proposal from "adding a new crate" to "adding these new types to existing crates", with the APIs here seen as provisional. The full APIs and implementations could then be fleshed out in PRs. What do you think?
Since there wasn't sufficient motivation for creating a new repository (at least not at this moment), let's close this RFC.
There's already a PR for adding AtomicCell
to crossbeam-utils
.
I'm also implementing AtomicBox
/AtomicArc
based on a simple variant of hazard pointers, which should give us a chance to experiment and gain experience before designing a real hazard pointers library.
Create a new crate
crossbeam-atomic
, which will host a collection of atomic utilites that complement those instd::sync::atomic
.No code is proposed as a starting point for the crate - this time we'll simply start with an empty crate.
Rendered