Open Eh2406 opened 5 years ago
Is this something that could be fixed in fs::canonicalize and Path::join? If not, is that because of backward compatibility restrictions or because the current behavior is useful and worth keeping? If the latter, how would the spec for this new library's canonicalize and join functionality be different from what std canonicalize and join claim to do?
So @retep998 knows this better than I.
My impression is that the fs::canonicalize
is locked by backward compatibility and adding an alternative like fs::normalize
is blocked on the lib team being sure it is a good design (https://github.com/rust-lang/rust/issues/59117) The impression is that a crate could iterate and suggest uplift when polished.
My impression is that the std::Path::join
api is technically correct. Technically \\?\C:\bar\../foo
is a valid path on windows, it is "on drive C
in folder bar
in folder ../foo
" and so joining \\?\C:\bar\
with ../foo
should give you \\?\C:\bar\../foo
. However if my Cargo toml claims that foo = { path = "../foo" }
, then Cargo crashes if my working directory is \\?\C:\bar\
, we need a library that does the same modifications to the path as joining ../foo
would on a normal path.
Thanks, makes sense. Would something like the following API be sufficient for what Cargo needs? :
pub struct SanePath {...}
impl SanePath {
pub fn normalize<P: AsRef<Path>>(path: P) -> Self;
pub fn join<P: AsRef<Path>>(&self, path: P) -> Self;
pub fn as_std_path(&self) -> &Path;
}
impl AsRef<Path> for SanePath {...}
Yes! Or freestanding functions:
pub fn normalize<P: AsRef<Path>>(path: P) -> PathBuf {...}
pub fn join<P: AsRef<Path>>(base: &mut PathBuf, addition: P) {...}
fs::canonicalize
is not broken, but rather it does a fundamentally different operation than what fs::normalize
would do. One is asking the OS for the canonical path to a file that actually exists, the other is just turning a relative path into an absolute path.
I'd like to tackle this one, but from this issue I didn't understand whether popping on a ..
and doing nothing on a .
is enough (there are some edge cases but that the gist of it). This may break symlinks, is this desired? Are there any caveats one should be aware of?
I think it is ok to brake symlinks. \\?\C:\bar/../foo
is a valid windows path, but we don't want the folder named bar/../foo
in C:\
. We want the folder foo
in C:\
. Dose that makes sense?
Are https://crates.io/crates/path-absolutize and https://crates.io/crates/path-dedot what you need?
Thanks for the links. Now that I have had time to read it carefully... I don't know. path-absolutize
does not use GetFullPathNameW
, witch is grate for cross compat, but makes me nervous that it will differ from the OS behavior. Neither crate has tests for UNC paths or cases of mixtures of /
and \
. From the examples and skimming the code I don't know how well it handles non-unicode data, but I am not sure how well std does ether.
I saw a post related to this on Reddit recently, and I'd like to try implementing this.
@Eh2406 @dtolnay could you have a look at my repository (https://github.com/ajeetdsouza/pathology) and tell me your initial thoughts? I've currently only written an implementation for Windows, but I'm working on one for Linux too.
The only function I've implemented is normalize
, which lexically converts the path to its simplest form, without actually querying the filesystem.
On Windows, I'm using GetFullPathNameW
for this. Since drive letters are case insensitive, but are written in uppercase by convention, I'm capitalizing it manually (GetFullPathNameW
doesn't always do this). Error handling is still a WIP, I'll get that done soon enough.
On Linux, there is no system call to lexically normalize paths, but GNU realpath
can do this via realpath -ms
. I'm currently rewriting that logic in Rust.
Update: I've added normalize
on Unix, too.
@ajeetdsouza sorry for the slow reply. That looks like a great start on a answer to the question "I use your_thing(&std::fs::canonicalize(...))
and it broke, what should I use instead?" I look forward to having that in our arsenal. Thank you!
@retep998 may be able to revue the windows api calls better than I.
Related note for others, dunce is an alternative to fs::canonicalize
.
normpath can now be used to solve this issue. It defines BasePath
, which is very similar to SanePath
, and PathExt::normalize
can be used for normalization.
It would be great if @retep998 could review this crate as well.
@Eh2406 What would be the next steps for integrating this into Cargo?
Last time I had this paged in https://github.com/rust-lang/cargo/issues/6198 was the best link. Looks like we just got a related PR https://github.com/rust-lang/cargo/pull/8874, so coordinating my make sense. Another good place to start would be to grep for canonicalize
in the code base. Sorry I don't remember this stuff better.
@Eh2406 Thanks. I'll start looking through where the path handling can be improved
Std path does some weird things with UNC paths (
\\?\
). For one std::fs::canonicalize always returns one whether it is needed or not. Thenpath.join
will just do a string concatenation leading to invalid paths. This leads to bugs in many important parts of the ecosystem. (Cargo, wasm-pack, Rustup)So we need a library that provides a binding to
GetFullPathNameW
on windows and usesstd::fs::canonicalize
otherwize. More ambitiously we need a path "interpreter" that "..
and.
while appending. So if you started with\\?\C:\bar
and joined../foo
you would iterate the components of../foo
and apply them to the base path, first applying..
to get\\?\C:\
and then applyingfoo
to get\\?\C:\foo
."Ether part of this would be a valuable addition to the ecosystem!