NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.49k stars 1.44k forks source link

.lock files incorrectly block the build process #10897

Closed chayleaf closed 13 hours ago

chayleaf commented 3 weeks ago

Describe the bug

I only have this issue on one of my Arm machines that I use as a remote builder.

Whenever I'm building something, at times Nix will print copying 0 paths.... When that happens, the build will stall until I execute the following command on the remote builder:

to_remove="$(find /persist/nix/store/*.lock -mtime 0)"
echo "Removing $to_remove..."
rm $to_remove

(/persist is the path that backs impermanence; I have to use that path here because /nix is read-only)

This is the case in Nix 2.18, and has been the case on prior Nix versions as well, since at least November (perhaps it has been an issue long before November).

Steps To Reproduce

It's unclear what exactly causes the issue, but in my specific conditions it happens 100% of the time when "copying 0 paths..." is printed.

Expected behavior

Build continuing as usual.

nix-env --version output

nix-env (Nix) 2.18.2

Additional context

Perhaps the fact I'm using bcachefs with no ACLs could cause the issue here? I have no idea. I'm willing to work on fixing this (in fact I suppose it's a necessity given nobody else complained about it), but I have no idea where to begin.

Priorities

Add :+1: to issues you find important.

roberth commented 2 weeks ago

Do you have remote builders for your remote builders? If not, we can rule out https://github.com/NixOS/nix/issues/10740. Could you try with -vvvvv? If that does not reveal a potential cause, could you attach GDB and print stack traces for the client and the remote nix-daemon's worker process? You can find the latter's pid in the process tree under an sshd process. Directly under ssh you might only find a dumb proxy, in which case we'll probably need stack traces from the corresponding nix-daemon worker process. Those are started with the client's pid as an argument, for this purpose, of correlating them when debugging.

chayleaf commented 2 weeks ago

I do have remote builders for my remote builders. The scheme is as follows: x86_64 workstation <-> aarch64 server, the server uses my workstation for x86_64 build jobs, the workstation uses the server for aarch64 build jobs. Should this issue be closed in favor of #10740?

roberth commented 13 hours ago

Should this issue be closed in favor of #10740?

Yeah, that seems to be the same underlying issue then. Thanks for confirming!

You could subscribe to that issue if you haven't already.