NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.19k stars 1.47k forks source link

Use `std::filesystem::path` for `Path` #9205

Open Ericson2314 opened 11 months ago

Ericson2314 commented 11 months ago

We currently have Path defined as std::string. This is kinda sloppy in general, but especially bad if we ever want to work on Windows.

We should instead use std::filesystem::path, which is the newish C++ standard Path type, and which does all the right things on all platforms.

Unfortunately, there isn't yet a std::filesystem::path_view. https://github.com/cplusplus/papers/issues/406 will eventually fix this, but it is not yet stable and won't be for a while. This gives us two choices:

  1. Use https://github.com/ned14/llfio, which is pioneering what will hopefully be standardized as std::filesystem::path_view.
  2. Hand-roll our own PathView, which can be a small convenience around std::basic_string_view<Path::value_type> that has the missing implicit coercions.

I think (2) is the better first step; we can easily got from (2) to (1) later if we want.

Also per what @edolstra says before CanonPath should not change and we should also use it more. It would be for "internal" paths / paths in our SourcePath VFS we want almost all file access to go though. That means std::filesystem::path is just used for the lower level glue code with the OS.

edolstra commented 11 months ago

We already have CanonPath, we should probably use that more widely.

Ericson2314 commented 11 months ago

@edolstra I would say they are both useful but for different purposes:

I started slogging away at a branch for this issue, and I made a NarPath that indeed should just be CanonPath so thanks for bringing it up! :)

ned14 commented 10 months ago

Re: (2) the LLFIO path_view implementation isn't especially dependent on the rest of LLFIO, and it could be hoisted out without much work.

Re: stability, I don't think LEWG made a single change last meeting which affected the code, not even renaming.

Ericson2314 commented 10 months ago

@ned14 Thanks for chiming in here. While yes path_view part is a relatively small part of your library, I think we would end up using more of it because it seems to be the only game in town for abstracting over file descriptors and windows handles. We have a fair amount of new-school file-descriptor-based system calls, and it would be a shame to give that up for some lowest-common-denominator portability.

nixos-discourse commented 10 months ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-17-nix-team-meeting-minutes-104/35753/1

qknight commented 4 months ago

@Ericson2314 I tried to create symlinks on Windows using MinGW but it fails for files and also directories so instead this worked:

 if (CreateSymbolicLinkA(dstd, srcd, SYMBOLIC_LINK_FLAG_DIRECTORY) == FALSE) {
        std::cout << "Error creating directory symlink\n" << std::endl;
        DWORD error = ::GetLastError();
        std::string message = std::system_category().message(error);
        std::cout << message << std::endl;
    }

And playing with canonPath and absPath I've discovered that it converts Windows paths neatly but some of the functions only work if the files are actually there and can't be used to 'normalize' a path soly based on a string representation, i.e. remove the trailing slash for instance.

Path canonPath(PathView path, bool resolveSymlinks) {
    assert(path != "");
    fs::path p = path;

    if (!p.is_absolute())
        throw "not an absolute path";

    fs::path canonical_path;
    try {
        canonical_path = fs::weakly_canonical(p);
    }
    catch (std::filesystem::filesystem_error const &ex) {
        throw "weakly_canonical path problem";
    }
    return Path(canonical_path.make_preferred().lexically_normal().string());

}

Path absPath(PathView path, std::optional <PathView> dir = {}, bool resolveSymlinks = false) {
    Path pathAbsolute;
    try {
        pathAbsolute = fs::absolute(path).make_preferred().lexically_normal().string();
    }
    catch (std::filesystem::filesystem_error const &ex) {
        throw "abs path problem";
    }
    return Path(canonPath(pathAbsolute, resolveSymlinks));
}

I'll provide a branch with unit tests for inspiration soon.

Ericson2314 commented 4 months ago

@qknight Great! Since I am hopefully a GSOC project will be picking up this issue (freeing us to work on other things), starting with a bunch of unit tests is an excellent first step.

qknight commented 4 months ago

In https://github.com/NixOS/nix/commit/508e76212f55d57c5ce84e629b9081c6e3d653e2 i implemented:

which both use std::filesystem and are wrapped in absPath(..) and canonPath(..).

i only tested them using ./libnixutil-tests.exe on windows 10 natively in powershell, here are the results:

.\libnixutil-tests.exe --gtest_color=yes --gtest_filter=*
...
[ PASSED ] 278 tests.
[ FAILED ] 5 tests, listed below:
[ FAILED ] GitTest.blob_read
[ FAILED ] GitTest.blob_write
[ FAILED ] GitTest.tree_read
[ FAILED ] GitTest.tree_write
[ FAILED ] GitTest.both_roundrip

future tests

i would wish for

qknight commented 4 months ago

We should also have tests covering these requirements:

The problem is the filesystem library must support long file names and symlinks (including dangling and recursive). Remember, there is no reliable cp -r on Windows. All software I tested: from built-in copy and xcopy through MinGW bash and coreutils, to 3-rd party robocopy do fail on doing cp -r with real-world directory structure which has both long filenames, recursive symlinks and non-printable characters (such as checked out LLVM and Chromium monorepos), so cp -r, ln -s, … have been written in Perl with some external modules (and even this choice required patches in C code of those modules).

ned14 commented 4 months ago

All software I tested: from built-in copy and xcopy through MinGW bash and coreutils, to 3-rd party robocopy do fail on doing cp -r with real-world directory structure which has both long filenames, recursive symlinks and non-printable characters (such as checked out LLVM and Chromium monorepos)

I would agree about copy and xcopy.

I don't agree about robocopy, which was written by an experienced NT kernel developer precisely because all the other tools had issues. I've never found anything possible on a NT filesystem robocopy can't be persuaded to handle, though it may take non-obvious options. It doesn't use the same copying semantics as POSIX cp, which can produce unpleasant surprise.

Where robocopy does fall down somewhat is large directories - it isn't quick at deleting a 100 million item directory. A LLFIO based tool can clear that in under ten seconds, robocopy can take lots of minutes.

I also really don't like std::tmpfile at all or indeed anything in the standard library to do with temporary files. They were all very poorly designed. POSIX mkotemp is much more like it.