Open Ericson2314 opened 11 months ago
We already have CanonPath
, we should probably use that more widely.
@edolstra I would say they are both useful but for different purposes:
CanonPath
is our normalized, portable notion of a path, just like file system objects as opposed to actual per-OS filesystem interfaces.
std::filesystem::path
is native paths for acting with the physical filesystem, with all their craziness (case insensitivity, crazy Windowsism, etc.)
I started slogging away at a branch for this issue, and I made a NarPath
that indeed should just be CanonPath
so thanks for bringing it up! :)
Re: (2) the LLFIO path_view
implementation isn't especially dependent on the rest of LLFIO, and it could be hoisted out without much work.
Re: stability, I don't think LEWG made a single change last meeting which affected the code, not even renaming.
@ned14 Thanks for chiming in here. While yes path_view
part is a relatively small part of your library, I think we would end up using more of it because it seems to be the only game in town for abstracting over file descriptors and windows handles. We have a fair amount of new-school file-descriptor-based system calls, and it would be a shame to give that up for some lowest-common-denominator portability.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/2023-11-17-nix-team-meeting-minutes-104/35753/1
@Ericson2314 I tried to create symlinks on Windows using MinGW but it fails for files and also directories so instead this worked:
if (CreateSymbolicLinkA(dstd, srcd, SYMBOLIC_LINK_FLAG_DIRECTORY) == FALSE) {
std::cout << "Error creating directory symlink\n" << std::endl;
DWORD error = ::GetLastError();
std::string message = std::system_category().message(error);
std::cout << message << std::endl;
}
And playing with canonPath
and absPath
I've discovered that it converts Windows paths neatly but some of the functions only work if the files are actually there and can't be used to 'normalize' a path soly based on a string representation, i.e. remove the trailing slash for instance.
Path canonPath(PathView path, bool resolveSymlinks) {
assert(path != "");
fs::path p = path;
if (!p.is_absolute())
throw "not an absolute path";
fs::path canonical_path;
try {
canonical_path = fs::weakly_canonical(p);
}
catch (std::filesystem::filesystem_error const &ex) {
throw "weakly_canonical path problem";
}
return Path(canonical_path.make_preferred().lexically_normal().string());
}
Path absPath(PathView path, std::optional <PathView> dir = {}, bool resolveSymlinks = false) {
Path pathAbsolute;
try {
pathAbsolute = fs::absolute(path).make_preferred().lexically_normal().string();
}
catch (std::filesystem::filesystem_error const &ex) {
throw "abs path problem";
}
return Path(canonPath(pathAbsolute, resolveSymlinks));
}
I'll provide a branch with unit tests for inspiration soon.
@qknight Great! Since I am hopefully a GSOC project will be picking up this issue (freeing us to work on other things), starting with a bunch of unit tests is an excellent first step.
In https://github.com/NixOS/nix/commit/508e76212f55d57c5ce84e629b9081c6e3d653e2 i implemented:
which both use std::filesystem and are wrapped in absPath(..)
and canonPath(..)
.
i only tested them using ./libnixutil-tests.exe
on windows 10 natively in powershell, here are the results:
.\libnixutil-tests.exe --gtest_color=yes --gtest_filter=*
...
[ PASSED ] 278 tests.
[ FAILED ] 5 tests, listed below:
[ FAILED ] GitTest.blob_read
[ FAILED ] GitTest.blob_write
[ FAILED ] GitTest.tree_read
[ FAILED ] GitTest.tree_write
[ FAILED ] GitTest.both_roundrip
i would wish for
We should also have tests covering these requirements:
The problem is the filesystem library must support long file names and symlinks (including dangling and recursive). Remember, there is no reliable cp -r on Windows. All software I tested: from built-in copy and xcopy through MinGW bash and coreutils, to 3-rd party robocopy do fail on doing cp -r with real-world directory structure which has both long filenames, recursive symlinks and non-printable characters (such as checked out LLVM and Chromium monorepos), so cp -r, ln -s, … have been written in Perl with some external modules (and even this choice required patches in C code of those modules).
All software I tested: from built-in copy and xcopy through MinGW bash and coreutils, to 3-rd party robocopy do fail on doing cp -r with real-world directory structure which has both long filenames, recursive symlinks and non-printable characters (such as checked out LLVM and Chromium monorepos)
I would agree about copy and xcopy.
I don't agree about robocopy, which was written by an experienced NT kernel developer precisely because all the other tools had issues. I've never found anything possible on a NT filesystem robocopy can't be persuaded to handle, though it may take non-obvious options. It doesn't use the same copying semantics as POSIX cp
, which can produce unpleasant surprise.
Where robocopy does fall down somewhat is large directories - it isn't quick at deleting a 100 million item directory. A LLFIO based tool can clear that in under ten seconds, robocopy can take lots of minutes.
I also really don't like std::tmpfile
at all or indeed anything in the standard library to do with temporary files. They were all very poorly designed. POSIX mkotemp
is much more like it.
We currently have
Path
defined asstd::string
. This is kinda sloppy in general, but especially bad if we ever want to work on Windows.We should instead use
std::filesystem::path
, which is the newish C++ standard Path type, and which does all the right things on all platforms.Unfortunately, there isn't yet a
std::filesystem::path_view
. https://github.com/cplusplus/papers/issues/406 will eventually fix this, but it is not yet stable and won't be for a while. This gives us two choices:std::filesystem::path_view
.PathView
, which can be a small convenience aroundstd::basic_string_view<Path::value_type>
that has the missing implicit coercions.I think (2) is the better first step; we can easily got from (2) to (1) later if we want.
Also per what @edolstra says before
CanonPath
should not change and we should also use it more. It would be for "internal" paths / paths in ourSourcePath
VFS we want almost all file access to go though. That meansstd::filesystem::path
is just used for the lower level glue code with the OS.