NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.38k stars 14.33k forks source link

staging-next 2023-06-02/coreutils-9.3 broke all builds on nfs for me #244331

Open rski opened 1 year ago

rski commented 1 year ago

(and I'm feeling fine).

I bisected across 16k commits of nixpkgs on a nix store on an nfs drive, where evaluating the sample flake I had at hand took 3+ minutes. Please clap.

This is the flake I used as an example:

{
  inputs = {
    nixpkgs.url = "/home/rski/garage/Code/nixpkgs";
  };

  outputs = { nixpkgs, ... }:
    let pkgs = import nixpkgs { system = "x86_64-linux"; };
    in {
      packages = {
        gopls = (pkgs.buildGo120Module {
          pname = "gopls";
          version = "unstable";
          src = pkgs.buildPackages.fetchFromGitHub {
            owner = "golang";
            repo = "tools";
            rev = "gopls/v0.12.2";
            sha256 = "sha256-mbJ9CzJxhAxYByfNpNux/zOWBGaiH4fvIRIh+BMprMk=";
          };
          vendorSha256 = "sha256-Wx0tXrw3Y3Of3aZNYiD9EVYKFpqA3kqe5tFqppoe0A0=";
          modRoot = "gopls";
          doCheck = false;
          check = false;
          subPackages = [ "." ];
        });
      };
    };
}

and with the command rm -f flake.lock && nix build .#packages.gopls (there's probably a better way to update flake.lock to reflect git checkouts in the local nixpkgs repo, but see the part about every nix eval taking 3+ minutes)

At commit 09720cc41f0, the build fails, at git checkout 09720cc41f0~1, commit a6c64b2c29b things work again.

I'm guessing the problem is https://github.com/NixOS/nixpkgs/pull/235556/commits/1a29857b8a93f5259f0c2e919becc0bf9db24f85, but I have not tried it yet.

Steps To Reproduce

Steps to reproduce the behavior:

  1. have /nix on an nfs drive
  2. have any derivation try to build, see sample flake above
  3. failure!

Output logs of the failure ( nix log /nix/store/bvx11f2mbwhn86czn220520qwc3i52ps-gopls-unstable.drv)

@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
unpacking source archive /nix/store/lpmhj0fh1ww4zby35m4lfvnq0sm60jfd-source
cp: preserving permissions for 'source/.gitignore': Operation not supported
cp: preserving permissions for 'source/README.md': Operation not supported
cp: preserving permissions for 'source/go.mod': Operation not supported
cp: preserving permissions for 'source/txtar/archive_test.go': Operation not supported
cp: preserving permissions for 'source/txtar/archive.go': Operation not supported
cp: preserving permissions for 'source/txtar': Operation not supported
cp: preserving permissions for 'source/PATENTS': Operation not supported
cp: preserving permissions for 'source/cmd/ssadump/main.go': Operation not supported
cp: preserving permissions for 'source/cmd/ssadump': Operation not supported
cp: preserving permissions for 'source/cmd/go-contrib-init/contrib.go': Operation not supported
cp: preserving permissions for 'source/cmd/go-contrib-init/contrib_test.go': Operation not supported
cp: preserving permissions for 'source/cmd/go-contrib-init': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/main.go': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/testdata/src/pkg/pkg_test.go': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/testdata/src/pkg/pkg.go': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/testdata/src/pkg': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/testdata/src': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/testdata': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph/main_test.go': Operation not supported
cp: preserving permissions for 'source/cmd/callgraph': Operation not supported
cp: preserving permissions for 'source/cmd/goimports/doc.go': Operation not supported
cp: preserving permissions for 'source/cmd/goimports/goimports_not_gc.go': Operation not supported
cp: preserving permissions for 'source/cmd/goimports/goimports_gc.go': Operation not supported
cp: preserving permissions for 'source/cmd/goimports/goimports.go': Operation not supported
cp: preserving permissions for 'source/cmd/goimports': Operation not supported
cp: preserving permissions for 'source/cmd/gotype/gotype.go': Operation not supported
[snip]
do not know how to unpack source archive /nix/store/lpmhj0fh1ww4zby35m4lfvnq0sm60jfd-source

At some point, I was also seeing

      > Cannot copy /nix/store/49v7a7ali88cgdkfn1l9i1shz0a9lmxf-source to source: destination already exists!
       > Did you specify two "srcs" with the same "name"?
       > do not know how to unpack source archive /nix/store/49v7a7ali88cgdkfn1l9i1shz0a9lmxf-source

but not at the point of bisection I guess.

I have the exact same config on a NixOS laptop and an ubuntu laptop, where the store is on a not terrible partition, and it works fine there.

Some other things I tried:

I'm a bit focused on the NFS part, because I had trouble with nix on NFS in the past as well, and to me that's the big differentiator here.

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 5.10.62-32152435.AroraKernel510.el7.x86_64, CentOS Linux, 7 (Core), nobuild` (arista specific OS, dw about it. Basically CentOS)
 - multi-user?: `no`
 - sandbox: `no`
 - version: `nix-env (Nix) 2.16.1`
 - channels(rski): `"home-manager, nixpkgs"`
 - nixpkgs: `/home/rski/.nix-defexpr/channels/nixpkgs`

I also had nix2.11 installed, and that had the same problems.

rski commented 1 year ago

1a29857b8a93f5259f0c2e919becc0bf9db24f85 is not it

rski commented 1 year ago

current bisect status:

git bisect start
# status: waiting for both good and bad commits
# bad: [09720cc41f0dad446f119e3a6259c640d4b33003] Merge #235556: staging-next 2023-06-02
git bisect bad 09720cc41f0dad446f119e3a6259c640d4b33003
# status: waiting for good commit(s), bad commit known
# good: [a6c64b2c29b11b3a9206918a46a37a1c53cdf1a0] Merge pull request #230373 from gregod/photoprism-230506-9de9a3540
git bisect good a6c64b2c29b11b3a9206918a46a37a1c53cdf1a0
# bad: [9c289b427e36f1f317673e5456067de45f8bf2fe] Merge pull request #234994 from layus/autopatchelf-single-files
git bisect bad 9c289b427e36f1f317673e5456067de45f8bf2fe
# good: [9441fc25d1b6af4d2323549221e5eb17bb26f6bd] Merge staging-next into staging
git bisect good 9441fc25d1b6af4d2323549221e5eb17bb26f6bd
rski commented 1 year ago

I'm very certain it's due to the corutils upgrade. There are even mentions of relevant things in the changelog: https://lists.gnu.org/archive/html/coreutils-announce/2023-04/msg00000.html


  cp --reflink=auto (the default), mv, and install
  will again fall back to a standard copy in more cases.
  Previously copies could fail with permission errors on
  more restricted systems like android or containers etc.
  [bug introduced in coreutils-9.2]

  cp --recursive --backup will again operate correctly.
  Previousy it may have issued "File exists" errors when
  it failed to appropriately rename files being replaced.
  [bug introduced in coreutils-9.2]
rski commented 1 year ago

cc @dasJ

rski commented 1 year ago

semi-related, given the changelog, these might also need fixing:

rski@rski ~/C/n/nixpkgs ((e959b488))> rg "no-clobber"
pkgs/build-support/dotnet/make-nuget-source/default.nix
21:        cp --no-clobber '{}' $out/lib ';'

pkgs/build-support/setup-hooks/move-lib64.sh
17:        mv --no-clobber "$i" $prefix/lib

pkgs/build-support/docker/default.nix
780:    cp -R --no-clobber inputs/*/* image/

nixos/modules/services/networking/znc/default.nix
295:            cp --no-preserve=ownership --no-clobber ${cfg.configFile} ${cfg.dataDir}/configs/znc.conf
rski commented 1 year ago

tried adding http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blobdiff_plain;f=src/cp.c;h=00a5cb813711826102e8d3c7d41cf99b4b1b656f;hp=488770a0b6e963e3c876b44e0b5bb2bee0690941;hb=c6b1fe43474b48a6bf5793e11cc1d0d6e895fdf4;hpb=7223651ad194a5868b58c1be6c7452fd3ca2f75a to the coreutils patches, doesn't seem to work

rski commented 1 year ago

i'm out of ideas for now, maybe the coreutils update needs to be reverted?

rski commented 1 year ago

looks like https://lists.gnu.org/archive/html/bug-coreutils/2023-06/msg00010.html

Artturin commented 1 year ago

https://github.com/NixOS/nixpkgs/issues/244331 9.4 which will maybe fix the issue, the related bug threads are hard to follow.

GeoffreyFrogeye commented 1 year ago

Same here, trying to switch to 23.11. Can't build anything on company desktops. Thanks for the investigation work.

I tried using an overlay with an older version of coreutils, but then no binary cache (and compiling everything from scratch runs into other issues I haven't investigated).

Did you manage to find a workaround? Or even which patch of the bug threads actually fixes the issue (can't stay on 9.1 forever)?

aelbarkani commented 2 months ago

I have the same problem on OpenShift, using this Nix image https://hub.docker.com/layers/nixos/nix/2.24.7/images/sha256-2bf4f7ad8306dc40fda7a1f8f40717fbdbb606b2425bd24c4d52cf4214588657?context=explore. @rski @GeoffreyFrogeye did you find a solution for this ?

GeoffreyFrogeye commented 2 months ago

I worked around the issue by doing the cardinal sin of modifying the nix store directly so the version of coreutils used is actually an older one. This needs to be re-applied on every coreutils upgrade.

Which coreutils version are you using? I thought it would have been fixed in 9.4. I can't really test anymore since my nix-on-NFS use case disappeared.

rski commented 2 months ago

I tried with 9.4 and the problem persists. I ended up doing a hacky thing where I build everything on a persistent storage and then copy it over to nfs

aelbarkani commented 2 months ago

Using 9.5 on my end.

rski commented 3 days ago

one possible solution is[^1]:

https://github.com/rski/nixpkgs/commit/daabbf44c5ab54371873ca673014c86bf3fb86ca, making defaultUnpack do:

    cp -r --reflink=auto -- "$fn" "$destination"

instead of cp -rp, perhaps even adding

   --preserve=mode,timestamps

I'm not sure what would break if I changed these flags though, and I'd rather not be responsible for melting down the entire nix ecosystem

[^1] I haven't tested it because building on nfs is horribly slow, but I think it should work.

aelbarkani commented 1 day ago

If it works it would be really nice !

rski commented 1 day ago

It seems like mode is the issue, not ownership. A more minimal repro, taken from https://github.com/samdroid-apps/nix-articles/blob/master/04-proper-mkderivation.md that doesn't depend on buildGoModule,

{
  inputs = {
    # has the fix, disabling mode on cp
    nixpkgs.url = "github:rski/nixpkgs/fdf6a2c96f5865215b185cdd81bcc94dde9c7778";
    nixpkgs2.url = "github:rski/nixpkgs/bdac0fa35d69d2ea454deeb71b0c826aef53886c";
  };

  outputs =
    { nixpkgs, nixpkgs2, ... }:
    let
      pkgs = import nixpkgs { system = "x86_64-linux"; };
      pkgs2 = import nixpkgs2 { system = "x86_64-linux"; };
    in
    {

      packages = {
        mytest = pkgs.stdenv.mkDerivation {
          name = "example-website-content";
          src = pkgs.fetchFromGitHub {
            owner = "jekyll";
            repo = "example";
            rev = "5eb1b902ca3bda6f4b50d4cfcdc7bc0097bac4b7";
            sha256 = "1jw35hmgx2gsaj2ad5f9d9ks4yh601wsxwnb17pmb9j02hl3vgdm";
          };
          installPhase = ''
            # Build the site to the $out directory
                export JEKYLL_ENV=production
          '';
        };
      };
    };
}