NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.95k stars 1.53k forks source link

Remote builds don't lock paths from being garbage collected #1970

Open dezgeg opened 6 years ago

dezgeg commented 6 years ago

While running a garbage collection on the build slave (the host jetson here) I had remote builds fail due to paths being garbage-collected under it:

waiting for the big garbage collector lock...
copying 1 paths...
copying path '/nix/store/66kvwqwg7xka2myc884w0947q66b059q-nixpkgs' to 'ssh://root@jetson'...
error: build of '/nix/store/8x9p44p4hi4zqkcrva0dyz4dwynnxqyx-linux-config-4.15.9.drv' on 'ssh://root@jetson' failed: dependency of '/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh' of '/nix/store/8x9p44p4hi4zqkcrva0dyz4dwynnxqyx-linux-config-4.15.9.drv' does not exist, and substitution is disabled
builder for '/nix/store/8x9p44p4hi4zqkcrva0dyz4dwynnxqyx-linux-config-4.15.9.drv' failed with exit code 1

The GC log confirms that path was deleted by the GC:

deleting '/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh'

Both sides have nix (Nix) 2.0.

dtzWill commented 6 years ago

Is that what's going on?? Eek! I managed to have a problem where Nix copied over the direct dependencies only and not transitive deps (when using diverted store IIRC) and got bizarre errors as a result.

Anyway I also have my builders gc every few hours so I bet I hit this as well.

7c6f434c commented 6 years ago

@dtzWill wait, so it is possible to have non-dependency-closed store without any obviously brute-force measures? That's probably worse than a failure to GC-pin, if you can reproduce, it is worth a separate issue, in my opinion.

dtzWill commented 6 years ago

I think it was because my ssh user wasn't trusted? No error but that might have been the cause. But yes I was rather surprised as well, Nix spoils me/us by requiring closure at all times I kinda forgot that it was possible to not have all references xD.

It'd be nice to have remote builds run via nixbld* users and such, I'll poke at this some more and file issue if I can reproduce. Good call.

edolstra commented 6 years ago

@7c6f434c The remote build protocol uses the BasicDerivation mechanism, which doesn't actually store the derivation on disk, so there is no referrer keeping the inputs alive.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

Gabriella439 commented 2 years ago

We were affected by this issue recently. A garbage collection in the middle of a remote build caused the remote build to fail

Gabriella439 commented 2 years ago

For what it's worth, this issue happened when the remote builder was using a single-user Nix installation. I'm only mentioning this because it sounds from the above discussion that a multi-user Nix installation might not be affected (since if all builds go through the nix-daemon then it can track which paths are still live due to a remote build).

thufschmitt commented 2 years ago

For what it's worth, this issue happened when the remote builder was using a single-user Nix installation. I'm only mentioning this because it sounds from the above discussion that a multi-user Nix installation might not be affected (since if all builds go through the nix-daemon then it can track which paths are still live due to a remote build).

Nah, that shouldn’t change anything, the daemon forks itself for each nix- call, so there’s no more state shared when using the daemon than when not using it. But I guess that there’s a missing call to addTempRoot (or that it’s called too late) in the codepath that Nix follows on the remote builder

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info