crossbeam-rs / rfcs

RFCs for changes to Crossbeam
Apache License 2.0
146 stars 13 forks source link

Organization of the crossbeam crate #14

Open ghost opened 6 years ago

ghost commented 6 years ago

The main crossbeam crate is going to be an umbrella crate that brings together the most important pieces of the Crossbeam project together and reexports them. I've been thinking what should it look like. Here are some quick ideas...

First, crossbeam depends on crossbeam-epoch and reexports the crate as:

crossbeam::epoch::* // from crossbeam-epoch

Then we have several atomic types, but I'm unsure if they should live in sync::atomic or just atomic. The former is more consistent with the standard library, though.

crossbeam::sync::atomic::{AtomicBox,AtomicArc,AtomicCell} // from crossbeam-atomic

There's also a bunch of data structures:

crossbeam::sync::Stack // from crossbeam-stack
crossbeam::sync::Queue // from crossbeam-queue
crossbeam::sync::channel::* // from crossbeam-channel
crossbeam::sync::{deque,Worker,Stealer} // from crossbeam-deque

Finally, some utilities:


crossbeam::scoped; // from crossbeam-utils
crossbeam::CachePadded; // from crossbeam-utils

But, instead of just shoving utilities into the crate root, we could organize them into submodules:

crossbeam::thread::scoped; // from crossbeam-utils
crossbeam::utils::CachePadded; // from crossbeam-utils

So the questions we need to answer are:

  1. What goes inside crossbeam and what needs to be left outside? When should a Rust programmer reach for crossbeam-X instead of crossbeam?
  2. What hierarchy of submodules do we want? Do we closely mimic std or come up with our own?
jeehoonkang commented 6 years ago

Before I can come up with any useful opinion, I think I need to learn more of Rust's module systems.. I'd like to ask a Rust question. Say a user imported crossbeam_queue and crossbeam_stack, both of which should have Atomic, Ptr, ... in their own namespace. Can the Rust compiler can deduce that crossbeam_queue::Atomic and crossbeam_stack::Atomic be equal?

ghost commented 6 years ago

Both crossbeam-queue and crossbeam-stack would pull in crossbeam-epoch as a dependency. If they pull in the same version of crossbeam-epoch, then the Atomic types are equal. Different versions of the same crate are like totally different crates.

If this sounds like the chances of having different versions of the same crate are too high, note that cargo helps a bit here. If you specify a dependency as crossbeam-epoch = "1.2.3", the version pulled in will be 1.2.X where the X is highest possible.

For example, if crossbeam-queue depends on crossbeam = "1.2.3", crossbeam-stack depends on crossbeam = "1.2.6", and the newest 1.2.X version is 1.2.7, then they will both pull in version 1.2.7 and use the same Atomic type.

But if one crate depends on 1.2.3 and the other on 1.3.5, then they will use different Atomic types.

jeehoonkang commented 6 years ago
martinhath commented 6 years ago

For what it's worth, I think I prefer eg. crossbeam::Queue over crossbeam::sync::Queue (eventually crossbeam::collections::Queue, but this might be too many modules?). Crossbeam is all about sync, so I feel having a sync module is kind of redundant, especially if collections and channels etc. is placed in the syncmodule.

It would be my guess that most users will use the data structures, channels, etc. and I think we should optimize the module layout for that, which, in my head, means having them close to the root module.

When that's said, I'm a beginner to both concurrent programming and project planning, so what do I know :smile:

Firstyear commented 6 years ago

Hey there,

I think splitting this up is not a good idea.

We require crossbeam to eventually become system-level packaged in an RPM else we are unable to use it in our application. I think that splitting this up into many smaller pieces creates a complexity and a confusion about what pieces are needed, and a barrier for system-level packaging.

I think it's better to have a single, cohesive package of structures and components that are really well tested together, rather than many moving parts. Moving parts make it harder to contribute, understand and follow, whereas a single repository is a nice one-stop place for a contributor or user to go to, and then easy to distribute further.

I hope this helps,

ghost commented 6 years ago

@Firstyear

Crossbeam aims to be the equivalent of java.util.concurrent written in Rust, more or less. As you can see, this Java package assembles a long list of semi-related data structures, and there is a lot of code in it.

We see Crossbeam not as one humongous crate, but instead as a project/organization that focuses on building a variety of tools (data structures, synchronization primitives, etc.) for concurrent/parallel programming. The contain-rs projects is structured the same way.

The idea is to have a separate self-contained crate for each tool, or for each group of closely related tools. The most commonly used tools (scoped threads, epoch-based reclamation, channels, and probably a few others) will be collected together into the crossbeam crate.

The whole crossbeam crate will then consist of just:

extern crate crossbeam_epoch;
extern crate crossbeam_channel;
// ...

pub mod epoch {
    pub use crossbeam_epoch::{pin, unprotected, Guard};
    pub use crossbeam_epoch::{Collector, Handle};
    pub use crossbeam_epoch::{Atomic, Owned, Ptr};
    pub use crossbeam_epoch::CompareAndSetOrdering;
}

pub mod channel {
    pub use crossbeam_channel::{bounded, unbounded};
    pub use crossbeam_channel::{Sender, Receiver};
    // ...
}

// ...

If you need a common data data structure, you can just use crossbeam. But if you want something more exotic (a Bw-Tree or something like that), you'll have to reach for crossbeam-something-exotic. Or, if you're building a memory allocator (see elfmalloc) and don't want to pull in the whole crate as a dependency, you can choose to depend on crossbeam-epoch only.

We require crossbeam to eventually become system-level packaged in an RPM else we are unable to use it in our application. I think that splitting this up into many smaller pieces creates a complexity and a confusion about what pieces are needed, and a barrier for system-level packaging.

I don't know what is your RPM packaging process, but what exactly is the barrier for packaging the crossbeam crate? Do you have to manually download all its dependencies and include it in the package? If so, is there not an automatic way to do that?

I think it's better to have a single, cohesive package of structures and components that are really well tested together, rather than many moving parts.

This is what crossbeam will be. Currently, we still don't have that many moving parts, but next year we'll start going into different directions and have a ton of unrelated (mostly advanced or experimental) crossbeam-* crates. Whenever such a small crate becomes stable enough, we'll bless it and include into the main crossbeam crate.

Moving parts make it harder to contribute, understand and follow, whereas a single repository is a nice one-stop place for a contributor or user to go to, and then easy to distribute further.

This is a valid concern, but perhaps we can alleviate the problem by clearly explaining the overall structure of the project in the readme?

jeehoonkang commented 6 years ago

This is a valid concern, but perhaps we can alleviate the problem by clearly explaining the overall structure of the project in the readme?

In my opinion, it's quite hard to maintain multiple inter-related repos in GitHub, and in consequence, almost all "big" projects hosted in GitHub somehow invented a methodology to manage multiple repos [citation needed..?]. Writing guides in README.md is obviously a good starter. We already have quite sophisticated project management systems, including the RFC process. Using these tools, I believe we will adequately manage the Crossbeam sub-projects.

On the other hand, I think Crossbeam will not be a "big" projects, e.g. consisting of million LOC, and one repo is just enough to host all the Crossbeam subprojects. Each of the monorepo's top-level directories may represent a crate, as done in https://github.com/redox-os/tfs . For this reason, I'm sympathetic to the concerns @Firstyear raised. But we already created several repos :) And I don't see a big benefit of removing all these repos and using a monorepo.

tl; dr: I agree with @stjepang. Let's use multiple repos.

ghost commented 6 years ago

@cuviper I was told you might be interested in this discussion.

Do you have an opinion on whether we should split Crossbeam into multiple smaller crates or have one large one?

cuviper commented 6 years ago

In Fedora, we're packaging at the crate level, as published on crates.io, so having a shared repo or separate repos doesn't change anything. And we have about 200 crates packaged already, so I don't see that it makes much difference whether crossbeam is one crate or a handful. The thing that does cause headaches is if there are circular dependencies, which sometimes arise through dev/build deps -- please avoid this!

More generally, I have experience with num split into multiple crates in a single repo. (rayon too, to a lesser extent.) I know a lot of users jumped on that when it became available, especially to grab just num-traits without worrying about the rest. I don't really know why that's more appealing vs. managing features though... 🤷‍♂️

I find it a little annoying to manage, but that may also be in part because I'm pretty much the only person maintaining it. If your project structure can better separate concerns, and especially if you have different people owning the different parts, then separate crates and repos makes a lot of sense to me.

Firstyear commented 6 years ago

So for clarity:

Thanks,

ghost commented 6 years ago

@Firstyear

Firstyear commented 6 years ago

That second point is the important one I think. It really needs to expose all the required parts. Like I think it would be complex to have a crate for 'crossbeam' and 'crossbeam-extras' or something.

So long as it stays as "one crate" in the end, then I'm happy with this :)

However, if it's "one crate" then why do we need to split it up at all if it's "one project". Is there really a measurable benefit at that point?

Thank you!

ghost commented 6 years ago

However, if it's "one crate" then why do we need to split it up at all if it's "one project". Is there really a measurable benefit at that point?

Firstyear commented 6 years ago

But this comes back to: Do you then have an rpm for crossbeam, and an rpm for crossbeam-epoch? do they become separate crates? If this happens it creates barriers to adoption and packaging.

Second, is it really worth micro optimising? We are not talking about a library with 100,000's of lines, but merely a few kb. In fact, it's about 44kb of code filesize, which means that for "output" to the compiled library, there will be only a few kb saving to "split" this.

Rather than becoming a series of "micro dependencies" like npm, (which is a fragile nightmare IMO), we should have a series of "robust modules", which do a collection of things well. If you want crossbeam epoch, you get crossbeam, and you deal with that.

Consider python - when you type "import os", so you can get may "os.path", you are pulling a reasonably sized dependency, but it's part of coherent well tested unit, that's easy to import and potentially redistribute.

I'm okay with the "many git repos under a single crate" idea, but I just don't want to see this become a mess of crates that people can't distribute in other formats (ie rpm).

I hope that helps,

cuviper commented 6 years ago

@Firstyear Sorry to be contrary, but I really don't see why you think having many crates is an issue for rpm. Most crates already have many dependencies, so whether some of these happen to all come from the same crossbeam-rs org is not really relevant.

If some other rpm package wants to use the main crossbeam meta-crate, that's fine -- they'll Require: crate(crossbeam) and all the sub-crates will get pulled in as transitive dependencies.

Rather than becoming a series of "micro dependencies" like npm, (which is a fragile nightmare IMO), we should have a series of "robust modules", which do a collection of things well.

This feels like you're railing against the crates.io ecosystem as a whole! For better or worse, such single-purpose crates are common. Hopefully it won't get leftpad-bad, although some jokers do exist...