Open christian-marie opened 6 years ago
I ran into something similar after running out of disk space, and clearing /nix/var/nix/current-load/
on the local machine seems to fix it.
@HackerFoo :+1: had this problem too because my /boot ran out of space
I marked this as stale due to inactivity. → More info
I've been running a Hydra server for a few months now, and it's been running fine with a single x86_64-linux remote build slave. After an indeterminate period, remote building starts hanging until timeout on the hydra-queue-runner side. The slave node will just sit there with N
nix-store --serve --write
commands, where N = the number of concurrent builds configured.The breakage doesn't seem to be related to any configuration change, as a reprovision of the hydra-master and slave nodes has fixed this issue twice now. Just rebuilding the slave doesn't. I'm getting sick of reprovisioning the build server, so now I'm digging.
The hydra-master machine is stuck in this state now, and I have narrowed reproducing this issue to triggering any remote build whatsoever in whatever state the server has ended up in.
I found via strace that the hydra-master build would wait on a select syscall (waiting for children here: https://github.com/NixOS/nix/blob/master/src/libstore/build.cc#L4170), whilst the nix-store --serve --write would wait on a read(0,..)
Now the weird bit: that blocking read appears to happen here: https://github.com/NixOS/nix/blob/master/src/nix-store/nix-store.cc#L790
I came to that conclusion by the uniqueness of readNum in that function, as per the stack trace below.
I don't know how this is meant to work, or if my analysis is correct, but it seems like some kind of protocol mismatch? I'm now building nix from source to test any suggestions. I should be ready for that in around 12 hours. I'll probably timebox finding a solution to this to this time tomorrow, and just rebuild again otherwise.
Perhaps you make something else of these stack traces:
hydra-slave1 nix-store --serve --write, blocked on read syscall of stdin:
hydra-master nix-env process, blocked on select:
And finally, the tail of debug output from nix-env: