Open chkno opened 5 years ago
+1, having the same issue right now.
Any other info I should extract from this stuck process before killing it?
$ beenrunning 11439
(1272390.24 - 8627974/100)s = 13.728131 days
While this is a minor problem for interactive use where an impatient user can hit ^C, it's a more serious problem for mechanisms like system.autoUpgrade.enable
where this happens in the background unnoticed and updates just stop indefinitely.
@chkno do you know how to workaround the problem and get the upgrade to work again? For me it's a blocker, I cannot do nixos-rebuild --upgrade switch
because of that.
After running kill 11439
(the PID of the stuck nix-channel --update
) I was able to update channels again.
Upon being killed, the process that had been stuck for two weeks printed this message (the same message it prints if you run it and immediately press control-C):
warning: warning: download of 'https://nixos.org/channels/nixos-19.09' was interrupted; using cached result
error: interrupted by the user
I've removed file it was waiting for:
/root/.cache/nix/tarballs/0s9swzqhys99sk5ndz23z2w13mbw9wd14ff58nfxyhss7a986jn9-file
Restarted and it works. I have no idea if restarting or removing the file has worked.
Same problem here, already happened twice, preventing automatic updates without noticing.
So far we encountered this problem on two machines, which both had in common that they ran on BTRFS, while our other machines run on different file systems.
@ktor, @chkno, may I ask which file system you are using?
@timor ext4
ext4
Ok, thanks.
It helps to kill a nix-channel --update nixos
process.
Right now I think the reason for this problem on my machine is network-related. It seems to happen when I lose/don't have network connection.
Possibly related: #3338 . Possible workaround: set the nix option download-attempts=1
We'll see if it'll be fixed with curl 7.67+ as Elco suggested in #3338
I marked this as stale due to inactivity. → More info
Any idea how to solve that issue? During a nixos-rebuild
I get a similar error:
waiting for lock on '/nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz'...
I tried to kill process, it's not working. And I can't remove/chmod this file:
sudo rm -rf /nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz.lock
rm: cannot remove '/nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz.lock': Read-only file system
I'd also love to avoid a nix-collect-garbage
because I spent already a day downloading stuff...
EDIT: the error magically vanished after a failed update...
Any idea how to solve that issue? During a
nixos-rebuild
I get a similar error:waiting for lock on '/nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz'...
I tried to kill process, it's not working. And I can't remove/chmod this file:
sudo rm -rf /nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz.lock rm: cannot remove '/nix/store/h8il05l5iazsvb2cahk3iybq3iyg1ja0-scout-runtime-0.20210527.0.tar.gz.lock': Read-only file system
I'd also love to avoid a
nix-collect-garbage
because I spent already a day downloading stuff...EDIT: the error magically vanished after a failed update...
I had similar issue, but after a reboot (dont know if reboot was needed) and doing:
❯ nix-store --delete /nix/store/srxygvvwgpm5w7lp7kpjqg4l2n7v814p-unit-nix-daemon.service.lock
finding garbage collector roots...
deleting '/nix/store/srxygvvwgpm5w7lp7kpjqg4l2n7v814p-unit-nix-daemon.service.lock'
deleting unused links...
note: currently hard linking saves 36985.18 MiB
1 store paths deleted, 0.00 MiB freed
with the lockfiles, they were removed.
I think I am seeing something similar but in the https://github.com/nix-community/nix-eval-jobs repository. Downloads seem to hang when using fetch
in primops.cc
. There is one qualifying factor here, which is that we are still using nix@2.3. I tracked down the issue in gdb. It seems as though the enqueuing thread properly mutates the download thread's incoming queue but the downloading thread never sees the queue update. Specifically the enqueuing thread push
es the DownloadItem
onto the queue in nix::CurlDownloader::enqueueItem
and nix::Downloader::workerThreadMain
never sees the item in the queue.
I am mentioning it because I am a little at a loss as to why that would be.
I think the hang I was seeing in nix-eval-jobs
was my misunderstanding the address space of the forks. I don't think the hang is the same as this one, now. Just the symptoms.
yes
I have a
nix-channel --update
that's been stuck for three days.It appears to have printed this before hanging:
Stack trace of the main thread (full trace of all threads):
This is the only process that has that file open, and it has the lock (the "W" for write-lock in "9uW"):
This stuck process blocks other invocations of
nix-channel --update
as well, causing them to emitwaiting for lock on '/root/.cache/nix/tarballs/...-file'...
and then hang indefinitely, but with a different backtrace:The list of channels it was trying to fetch:
It looks like the one that got stuck is the 19.09 channel metadata:
The file looks like it was successfully fetched. It's an html file with this inside: