dtolnay / request-for-implementation

Crates that don't exist, but should
610 stars 6 forks source link

A crate to handle Windows' "exotic" paths #34

Open Eh2406 opened 5 years ago

Eh2406 commented 5 years ago

Std path does some weird things with UNC paths (\\?\). For one std::fs::canonicalize always returns one whether it is needed or not. Then path.join will just do a string concatenation leading to invalid paths. This leads to bugs in many important parts of the ecosystem. (Cargo, wasm-pack, Rustup)

So we need a library that provides a binding to GetFullPathNameW on windows and uses std::fs::canonicalize otherwize. More ambitiously we need a path "interpreter" that ".. and . while appending. So if you started with \\?\C:\bar and joined ../foo you would iterate the components of ../foo and apply them to the base path, first applying .. to get \\?\C:\ and then applying foo to get \\?\C:\foo."

Ether part of this would be a valuable addition to the ecosystem!

dtolnay commented 5 years ago

Is this something that could be fixed in fs::canonicalize and Path::join? If not, is that because of backward compatibility restrictions or because the current behavior is useful and worth keeping? If the latter, how would the spec for this new library's canonicalize and join functionality be different from what std canonicalize and join claim to do?

Eh2406 commented 5 years ago

So @retep998 knows this better than I.

My impression is that the fs::canonicalize is locked by backward compatibility and adding an alternative like fs::normalize is blocked on the lib team being sure it is a good design (https://github.com/rust-lang/rust/issues/59117) The impression is that a crate could iterate and suggest uplift when polished.

My impression is that the std::Path::join api is technically correct. Technically \\?\C:\bar\../foo is a valid path on windows, it is "on drive C in folder bar in folder ../foo" and so joining \\?\C:\bar\ with ../foo should give you \\?\C:\bar\../foo. However if my Cargo toml claims that foo = { path = "../foo" }, then Cargo crashes if my working directory is \\?\C:\bar\, we need a library that does the same modifications to the path as joining ../foo would on a normal path.

dtolnay commented 5 years ago

Thanks, makes sense. Would something like the following API be sufficient for what Cargo needs? :

pub struct SanePath {...}

impl SanePath {
    pub fn normalize<P: AsRef<Path>>(path: P) -> Self;
    pub fn join<P: AsRef<Path>>(&self, path: P) -> Self;
    pub fn as_std_path(&self) -> &Path;
}

impl AsRef<Path> for SanePath {...}
Eh2406 commented 5 years ago

Yes! Or freestanding functions:

pub fn normalize<P: AsRef<Path>>(path: P) -> PathBuf {...}
pub fn join<P: AsRef<Path>>(base: &mut PathBuf, addition: P) {...}
retep998 commented 5 years ago

fs::canonicalize is not broken, but rather it does a fundamentally different operation than what fs::normalize would do. One is asking the OS for the canonical path to a file that actually exists, the other is just turning a relative path into an absolute path.

basil-cow commented 4 years ago

I'd like to tackle this one, but from this issue I didn't understand whether popping on a .. and doing nothing on a . is enough (there are some edge cases but that the gist of it). This may break symlinks, is this desired? Are there any caveats one should be aware of?

Eh2406 commented 4 years ago

I think it is ok to brake symlinks. \\?\C:\bar/../foo is a valid windows path, but we don't want the folder named bar/../foo in C:\. We want the folder foo in C:\. Dose that makes sense?

basil-cow commented 4 years ago

Are https://crates.io/crates/path-absolutize and https://crates.io/crates/path-dedot what you need?

Eh2406 commented 4 years ago

Thanks for the links. Now that I have had time to read it carefully... I don't know. path-absolutize does not use GetFullPathNameW, witch is grate for cross compat, but makes me nervous that it will differ from the OS behavior. Neither crate has tests for UNC paths or cases of mixtures of / and \. From the examples and skimming the code I don't know how well it handles non-unicode data, but I am not sure how well std does ether.

ajeetdsouza commented 4 years ago

I saw a post related to this on Reddit recently, and I'd like to try implementing this.

@Eh2406 @dtolnay could you have a look at my repository (https://github.com/ajeetdsouza/pathology) and tell me your initial thoughts? I've currently only written an implementation for Windows, but I'm working on one for Linux too.

The only function I've implemented is normalize, which lexically converts the path to its simplest form, without actually querying the filesystem.

On Windows, I'm using GetFullPathNameW for this. Since drive letters are case insensitive, but are written in uppercase by convention, I'm capitalizing it manually (GetFullPathNameW doesn't always do this). Error handling is still a WIP, I'll get that done soon enough.

On Linux, there is no system call to lexically normalize paths, but GNU realpath can do this via realpath -ms. I'm currently rewriting that logic in Rust.

ajeetdsouza commented 4 years ago

Update: I've added normalize on Unix, too.

Eh2406 commented 3 years ago

@ajeetdsouza sorry for the slow reply. That looks like a great start on a answer to the question "I use your_thing(&std::fs::canonicalize(...)) and it broke, what should I use instead?" I look forward to having that in our arsenal. Thank you!

@retep998 may be able to revue the windows api calls better than I.

pksunkara commented 3 years ago

Related note for others, dunce is an alternative to fs::canonicalize.

dylni commented 3 years ago

normpath can now be used to solve this issue. It defines BasePath, which is very similar to SanePath, and PathExt::normalize can be used for normalization.

It would be great if @retep998 could review this crate as well.

dylni commented 3 years ago

@Eh2406 What would be the next steps for integrating this into Cargo?

Eh2406 commented 3 years ago

Last time I had this paged in https://github.com/rust-lang/cargo/issues/6198 was the best link. Looks like we just got a related PR https://github.com/rust-lang/cargo/pull/8874, so coordinating my make sense. Another good place to start would be to grep for canonicalize in the code base. Sorry I don't remember this stuff better.

dylni commented 3 years ago

@Eh2406 Thanks. I'll start looking through where the path handling can be improved