NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.14k stars 1.47k forks source link

`nix-copy-closure` gets stuck in `locking path` #5304

Open nh2 opened 2 years ago

nh2 commented 2 years ago
niklas:~/ $ nix-copy-closure --from root@myhost /nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0 -vvvvvvv
...
debug1: Sending command: nix-store --serve --write
locking this thread to CPU 0
copying 1 paths...
starting pool of 1 threads
querying remote host 'root@myhost' for info on '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0'
copying path '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0' from 'ssh://root@myhost'...
acquiring global GC lock '/nix/var/nix/gc.lock'
acquiring read lock on '/nix/var/nix/temproots/11537'
acquiring write lock on '/nix/var/nix/temproots/11537'
downgrading to read lock on '/nix/var/nix/temproots/11537'
locking path '/nix/store/aigbjy9w1qcpjj12rz88m77bxf8a2flz-mypackage-0.1.0.0'

After this, there is no further output, and nix-copy-closure hangs forever.

Steps To Reproduce

Unsure, this is the first time I see it, but

Expected behavior

nix-env (Nix) 2.3.15 on NixOS 21.05

nh2 commented 2 years ago

journalctl -fu nix-daemon.service shows:

Sep 28 17:18:29 t25 nix-daemon[328803]: accepted connection from pid 10271, user niklas (trusted)

And then no further output.

Doing Ctrl+C on the nix-copy-closure results in daemon output:

Sep 28 17:20:57 t25 nix-daemon[10276]: 5 operations
Sep 28 17:20:57 t25 nix-daemon[10276]: unexpected Nix daemon error: writing to file: Broken pipe
nh2 commented 2 years ago

systemctl restart nix-daemon.service did not help.

Perhaps relevant:

# journalctl -eu nix-daemon.service
...
Sep 28 17:21:09 t25 systemd[1]: Stopping Nix Daemon...
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Succeeded.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4173552 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4173570 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4353 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 4364 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 8841 (nix-daemon) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Unit process 8879 (ssh) remains running after unit stopped.
Sep 28 17:21:09 t25 systemd[1]: Stopped Nix Daemon.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Consumed 48.865s CPU time, no IO, received 970.3M IP traffic, sent 19.7M IP traffic.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4173552 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4173570 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4353 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 4364 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 8841 (nix-daemon) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: nix-daemon.service: Found left-over process 8879 (ssh) in control group while starting unit. Ignoring.
Sep 28 17:21:09 t25 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 28 17:21:09 t25 systemd[1]: Started Nix Daemon.

Rebooting the machine helped.

After doing that, nix works as normal:

locking path ...
lock acquired on ...

Thus I suspect something gets stuck in a way that systemd could not kill it.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

nrdxp commented 1 year ago

possibly a duplicate of #3017

squalus commented 1 month ago

I just hit this on Nix 2.18.5. I had to manually delete the lock files in the Nix store.