Manishearth / namespacing-rfc

RFC for Packages as Optional Namespaces
47 stars 3 forks source link

Drawback: Names in code not matching names in, say, docs.rs could become required to have a namespace #7

Open carols10cents opened 3 years ago

carols10cents commented 3 years ago

cc @pietroalbini @withoutboats @joshtriplett

I feel like this concern has been brought up in multiple places and I think it's important enough to consolidate to one place.

The concern is that, depending on how/if we choose to implement this RFC, the names in .rs files would not match what someone working on the code would need to use on docs.rs to get the correct documentation.

Example possible today

To be clear, this problem exists today. Here is one example:

You're starting on a new project at work, you check out the repo and pop open src/main.rs:

use serde::Regex;

fn main() {
    let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
}

What?!?!? This doesn't make any sense!! If I go to docs.rs/serde, there's no Regex there!!! What's going on?!?!

The answer is in Cargo.toml, where the package regex from crates.io has been renamed to serde:

[dependencies]
serde = { version = "1.4.2", package = "regex" }

This is currently a choice each repo makes and would have to deal with the consequences of.

Effects of this RFC

If we go the route where we don't make any changes to rustc, and we map some separator like / to _, in order to use crates that take advantage of this RFC's feature, a repo would be forced to introduce this confusion.

Today, a crate renaming one of its dependencies in a confusing way is pretty rare. This would increase the frequency of this confusion.

Example in src/main.rs assuming we have implemented this feature using / as the separator that maps to _ and serde is using it for the crate currently known as serde_json:

use serde_json::Value;

fn main() { ... }

To find documentation for this Value, do I look at:

?

@pietroalbini previously wrote:

We should be careful to distinguish "crates.io package" from "crate", because crates are consumed by rustc and have different naming requirements. There is no reason that a crate name must be globally unique -- only unique within a single [dependencies] section or rustc invocation. And the mapping of crates.io package to crate is ~arbitrary, controlled entirely by the author of the build configuration.

While it's technically true that right now there is no enforced mapping between the crates.io crate name and the name you refer to in the source code, the mapping is still there by convention. I've yet to see a codebase where renamed dependencies are widely used, and when I see a crate name in the source code I implicitly expect it's the same name as the package.

Losing this property would be really bad IMO, as every project would have (for example) http::Request pointing to something different. Having to check the Cargo.toml every time I want to open docs.rs would be far from ideal for me.

What are others' thoughts on this issue, specifically, of source code name corresponding (or not) to crates.io crate name, regardless of exactly how they might differ?

jtgeibel commented 3 years ago

The docs.rs/serde/json URL brings up an interesting point. I don't know how docs.rs parses URLs, but for crates.io we use route-recognizer on the backend. We can specify URLs such as "/crates/:crate_id/owners" or /crates/:crate_id/:version/yank where the :crate_id and :version portions of the URL can be easily accessed from the controller. I don't believe there is any way for us to match against /crates/:optional_namespace/:crate_id/:version/yank. I think we would have to duplicate all our route definitions that contain a :crate_id. (And I'm thinking of only the backend here. Frontend routes are handled separately by frontend logic.)

It seems there would also be URLs that we can't disambiguate. Is /crates/foo/versions requesting a list of versions for the top-level crate foo, or the crate information for versions in the foo namespace? (Alternatively, we could keep /crates/:crate_id and add /ns/:namespace/:crate_id, but having two different prefixes is probably confusing to users.)

For this reason, we might need to find another separator so that the name in Cargo.toml aligns with our URL schemes. I think /crates/foo::versions/versions would work, though I understand that :: introduces some ambiguity in paths within .rs files.

Unfortunately, there are very few unreserved characters in a URL. Ideally we wouldn't need to percent encode the character within URLs either. Most conservatively this would limit us to ~ and . (from section 2.3 of RFC 3986 but excluding - and _, obviously). We definitely can't use ?, %, or #, because they have special meaning in URLs. I think we could get away with some characters like +, ,, and & because I think they are only escaped in the query parameters, though none of these seem desirable.

At the moment this has me leaning towards :: even though that has other complications (mentioned above). It probably isn't technically URL safe, but seems to work as expected for me in Firefox and curl.