crossbeam-rs / rfcs

RFCs for changes to Crossbeam
Apache License 2.0
146 stars 13 forks source link

A crate for atomic primitives: crossbeam-atomic #28

Closed ghost closed 6 years ago

ghost commented 6 years ago

Create a new crate crossbeam-atomic, which will host a collection of atomic utilites that complement those in std::sync::atomic.

No code is proposed as a starting point for the crate - this time we'll simply start with an empty crate.

Rendered

jeehoonkang commented 6 years ago

I like this proposal very much! Thanks for writing it up. I have two comments:

ghost commented 6 years ago

@Amanieu Since you asked for more details on the proposed atomic primitives, I expanded the RFC.

My thoughts:

What do you think?

ghost commented 6 years ago

Does this RFC now provide sufficient motivation for creating crossbeam-atomic? If so, let's merge.

Amanieu commented 6 years ago

LGTM

ghost commented 6 years ago

@Vtec234

To be honest, I'm not a huge fan of creating more crates. Although in this case it does look like this group of primitives needs a separate crate due to the cyclic dependency on epoch it would create, I think we should be wary of proliferating subcrates endlessly, as there already are quite a lot of them.

The motivation for creating crossbeam-atomic is not avoiding the cyclic dependency on crossbeam-epoch - that'd simply be an unfortunate side-effect of putting atomics into crossbeam-utils.

Instead, the purpose of crossbeam-atomic is to bundle together a coherent set of atomic primitives. I suggest looking at the java.util.concurrent.atomic package - this is what I primarily had in mind when writing this RFC.

I hope that if a better way to organise things comes up, we can do a breaking release which moves things around, kind of like tokio.

It's interesting to draw parallels with tokio now that it is going through the reform. Here are a few quotes I like from the article you've linked:

In Crossbeam we're following a similar strategy. Library authors can pick individual crates they need, e.g. if you're writing a concurrent data structure you might want to depend only on crossbeam-epoch. Application authors probably don't care about all the subcrates and would prefer to simply import crossbeam, which should bundle together all our stable subcrates.

Our subcrates are of varying levels of stability, too. E.g. crossbeam-deque and crossbeam-epoch are fairly stable, while crossbeam-skiplist definitely isn't. In my opinion, we should signal stability level by reexporting reasonably stable subcrates in crossbeam and leaving everything else out.

This is an important benefit of having separate crates. Iterating experimental crates (like crossbeam-skiplist) quickly without being constrained by stability guarantees of other crates makes versioning easier to manage.

Suppose both crossbeam-0.6 and crossbeam-0.7 depend on crossbeam-deque-0.2 - this way we minimize the amount of duplicated code throughout the ecosystem. Having subcrates also means there are smaller chances of encountering conflicts by using the same type from different versions of a crate. For example, crossbeam_0_6::deque::Deque and crossbeam_0_7::deque::Deque are the exact same type despite coming from different versions of crossbeam.

All that said, I'd be onboard with consolidating crates and rethinking the whole organization once the dust settles down. I hope we can release crossbeam-0.4.0 that simply reexports other crates within the next two months. And maybe even reach version 1.0 later this year. Let's watch what the tokio/futures folks are doing and how they're dealing with similar problems.

The Atomic type is very low-level and it would even make sense to use it in epoch. But we can't, because... The EpochCell (and maybe other types proposed here, depending on their implementation) depend on epoch, so we have a cycle.

Sure, but maybe that's not an issue if we don't use Atomic inside crossbeam-epoch? I understand the concern, but just don't see it as a huge problem that would justify moving it into crossbeam-utils. But there are tradeoffs both ways, for sure.

I thought a little bit more and it seems to me now that these types don't belong into a single crate. The reason is that they work on very different levels of abstraction. [...] I think the reason for this is that one type is very low-level, while the other has an entire garbage collector below it.

This is a good observation. Atomic is indeed low-level, while all the others are high-level.

While I find it tempting to separate things into high-level and low-level categories, I actually prefer separating by what they do, or what they are used for.

In that sense, Atomic and AtomicCell are very very similar, the main difference being the balance of ergonomics/safety and performance/control. They seem too similar to be separated away. You don't agree?

Then, let's put types that use a particular garbage collection mechanism into crates that provide it - for example EpochCell into epoch and HazardCell into hazard when we do eventually make that crate.

This makes a lot of sense, but... Now I kind of regret naming these primitives that way because Epoch and Hazard in their names describe how they're internally implemented, and not what they're for or how they should be used. Rustaceans wanting AtomicReference from Java shouldn't care or even know much about epochs and hazard pointers.

HazardCell is a bit special in that it doesn't even need crossbeam-hazard. I intend to implement a very simple special form of hazard pointers optimized for this type. It doesn't need a full blown GC. Maybe we should name it AtomicReference or AtomicPointer? Anyhow, HazardCell is just an atomic pointer (Box/Arc/Weak) you can load without the fear of a loaded value being concurrently destroyed.

EpochCell is very similar, but rather than destroying removed objects immediately, it destroys objects lazily by inserting them into the GC. All you need to know about it is: objects get lazily destroyed and keeping guards alive for a long time keeps destruction from kicking in. That's all. Perhaps we should call this type LazyAtomicPointer because removed objects are lazily destroyed?

In any case, given HazardCell, I find the utility of EpochCell kind of questionable. EpochCell is more limited (requires T: 'static) and comes with a few caveats (you mustn't hold a guard for too long). The main benefit is performance - updates (replace, set, compare_and_set) are faster because of lazy destruction.

These two primitives are a lot like AtomicReference in Java. I think of the the Epoch and Hazard parts in their names merely as implementation details rather than something that intrinsically binds them to crossbeam-epoch or crossbeam-hazard. What really differentiates them is when removed objects get destroyed - one primitive is eager and the other is lazy. That's all a user should care about.

The current proposal groups by abstraction, but doing it by implementation as described above instead would avoid unnecessary problems. Then, we can simply reexport these types together under mod atomic in the main crate, hence preserving a neat interface. Thoughts?

Yes, the main crate should have an atomic module (or reexport crossbeam-atomic as atomic). But I'd still prefer to group crates by abstraction rather than implementation. I mean, EpochCell will not be deeply ingrained into the epoch GC - it's just an abstraction that uses the epoch GC as an internal implementation detail.

Another, small problem is that naming of Atomic and Pointer conflicts with names already present in epoch. I would rename Pointer here to something like Boxed and epoch::Atomic to something like epoch::Guarded.

There was a recent long discussion on a similar problem here. The question was whether task::Context should be renamed to TaskContext since the name Context might be used very often elsewhere. The conclusion was that the module name serves as a disambiguator already, and task::TaskContext is just unnecessary long / stuttering.

The same applies in our situation. If you need to disambiguate these structs, using epoch::Atomic is perfectly fine. I actually like how epoch::Atomic, epoch::Shared, and epoch::Owned read - they're clear and not too long either.

ghost commented 6 years ago

@jeehoonkang Yeah, I'm having doubts about our split into multiple repositories, too.

One thing to consider here is that tokio, rayon, and futures group their subcrates into a single repo. Probably for a good reason.

Vtec234 commented 6 years ago

This makes a lot of sense, but... Now I kind of regret naming these primitives that way because Epoch and Hazard in their names describe how they're internally implemented, and not what they're for or how they should be used. Rustaceans wanting AtomicReference from Java shouldn't care or even know much about epochs and hazard pointers.

These two primitives are a lot like AtomicReference in Java. I think of the the Epoch and Hazard parts in their names merely as implementation details rather than something that intrinsically binds them to crossbeam-epoch or crossbeam-hazard. What really differentiates them is when removed objects get destroyed - one primitive is eager and the other is lazy. That's all a user should care about.

I think naming types by their implementations is fine - what it is, really, is a way for people to remember a set of properties under a single name. Imagine that instead of calling a type LinkedList we called it SlowSearchButQuickInsertAndSplitList instead. Even though a user shouldn't have to care about implementation details, it doesn't hurt to have a high-level idea of why a given type has the properties it does.

In any case, the discussion of naming is tangential to this - division into crates and naming are mostly independent, and as you noted module prefixes are an effective way of disambiguating type names. My reasoning for why I think it would be better to implement these types within existing crates is not due to naming, but rather a hope that we can keep similar code together.

In that sense, Atomic and AtomicCell are very very similar, the main difference being the balance of ergonomics/safety and performance/control. They seem too similar to be separated away. You don't agree?

I agree that they are very similar in purpose, but not in implementation. In fact, in that aspect they differ substantially and use varying mechanisms. My hope is that we could group them in this way, e.g. by the underlying GC scheme, so that all code in such a crate can be reasoned about on the same basis. Moreover, this would avoid dependency problems like the one which arose here (although in this case it's not much of a worry, i think similar problems will arise with by-interface grouping).

We could still easy have an interface which expresses the usage clearly by grouping the reexports into a single module, effectively like the Java atomic package.

I recall that in the case of structures, we decided to have one skiplist crate which contains both a set and a map based on this implementation, and then also e.g. a hasharray crate with both, as opposed to one map crate and one set crate, which would contain all implementations of their respective interfaces. It seems to me that this is a similar situation.

I agree with @stjepang that the number of crates is not a big deal.

As hopefully explained, I'm not opposed to new crates when they are clearly required, e.g. for new GC schemes, but rather to whether the Atomic types should necessarily go into a single crate.

ghost commented 6 years ago

I think naming types by their implementations is fine - what it is, really, is a way for people to remember a set of properties under a single name. Imagine that instead of calling a type LinkedList we called it SlowSearchButQuickInsertAndSplitList instead. Even though a user shouldn't have to care about implementation details, it doesn't hurt to have a high-level idea of why a given type has the properties it does.

Good point. I'd like to pull back on my suggestion on the eager vs lazy naming idea. :) I'm not yet sure what to call our types, to be honest. Maybe it becomes clearer once we implement them.

In the end it probably depends on how closely the implementation details are tied to the data structure's properties as seen from the outside. Choosing a very specific name is a commitment: BTreeMap will never be internally switched to a BST implementation, while slice::sort_unstable could be switched to a different algorithm. In C++ we have std::map and std::ordered_map rather than std::tree_map and std::hash_map. But in Rust we generally tend to prefer naming things after implementation details.

I recall that in the case of structures, we decided to have one skiplist crate which contains both a set and a map based on this implementation, and then also e.g. a hasharray crate with both, as opposed to one map crate and one set crate, which would contain all implementations of their respective interfaces. It seems to me that this is a similar situation.

Yes, that is true.

I agree that they are very similar in purpose, but not in implementation. In fact, in that aspect they differ substantially and use varying mechanisms. My hope is that we could group them in this way, e.g. by the underlying GC scheme, so that all code in such a crate can be reasoned about on the same basis. Moreover, this would avoid dependency problems like the one which arose here (although in this case it's not much of a worry, i think similar problems will arise with by-interface grouping).

It seems this is actually the root of our disagreement. You're convinced that all these primitives are very different in implementation and that their implementations are tied to underlying GCs.

My train of thought goes like this:

In the end, I can see why you're skeptical. I would be willing to hold off creating a new crate and prototype a set of these primitives so we can make a decision later.

Amanieu commented 6 years ago

Another crate you might be interested in: https://github.com/Amanieu/seqlock

Vtec234 commented 6 years ago

From your description, Atomic, Atomic(Ref)Cell and Adder/Accumulator are very similar objects that would fit well with other utilities in utils, while HazardCell and EpochCell would fit in hazard (not yet present) and epoch. I understand that HazardCell doesn't use the full-blown hazard pointers mechanism, but it has "global array of references", which is lightweight hazards. It seems to me that it would be easy to understand the code when it's alongside the heavier-weight hazard pointers, since they're a very similar idea, hence the grouping. Maybe others (@Amanieu, @jeehoonkang) have a preference for either option?

ghost commented 6 years ago

@Vtec234 I'm growing more and more sympathetic to your concerns about the complicated dependency graph that is emerging. Putting this RFC on hold.

Vtec234 commented 6 years ago

@stjepang It's still a perfectly fine RFC, I think it'd be good to go if you just changed the proposal from "adding a new crate" to "adding these new types to existing crates", with the APIs here seen as provisional. The full APIs and implementations could then be fleshed out in PRs. What do you think?

ghost commented 6 years ago

Since there wasn't sufficient motivation for creating a new repository (at least not at this moment), let's close this RFC.

There's already a PR for adding AtomicCell to crossbeam-utils.

I'm also implementing AtomicBox/AtomicArc based on a simple variant of hazard pointers, which should give us a chance to experiment and gain experience before designing a real hazard pointers library.