Open Echaleon opened 1 week ago
Could you see if system.swtich.enableNG = false
fixes the issue? we recently replaced switch-to-configuration
with a rust reimplementation and there might be bugs
Also does this usually happens during boot?
Could you see if
system.swtich.enableNG = false
fixes the issue? we recently replacedswitch-to-configuration
with a rust reimplementation and there might be bugs
I am on 24.05 (only importing unstable for some specific packages), so system.switch.enable = false is default, but I also tried with it enabled, and it still hung with the same logs, and nearly systemctl status
output:
| |─nixos-rebuild-switch-to-configuration.service
│ │ └─110006 /nix/store/cxi72wfnfg11g6zylrv0a5kkxm0ppvlc-nixos-system-argent-24.05.20241007.1bfbbbe/bin/switch-to-configuration switch
│ ├─nixos-upgrade.service
│ │ ├─109741 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/gr8i06n09q7p5x58vfdiq3wx1x0lpkq6-unit-script-nixos-upgrade-start/bin/nixos-upgrade-start
│ │ ├─109967 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild switch --recreate-lock-file --commit-lock-file --flake /etc/nixos
│ │ └─110005 systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER= --collect --no-ask-password --pipe --quiet --same-dir --service-type=exec --unit=nixos-rebuild-switch-to-configuration --wait /nix/store/cxi72wfnfg11g6zylrv0a5kkxm0ppvlc-nixos-system-argent-24.05.20241007.1bfbbbe/bin/switch-to-configuration switch
Interestingly, with the old version I caught it's initial execution in systemctl spawning seemingly two instances of the command, which might be the issue?
| |─nixos-upgrade.service
│ │ ├─101737 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/gr8i06n09q7p5x58vfdiq3wx1x0lpkq6-unit-script-nixos-upgrade-start/bin/nixos-upgrade-start
│ │ ├─101738 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild boot --recreate-lock-file --commit-lock-file --flake /etc/nixos --upgrade
│ │ ├─102022 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild boot --recreate-lock-file --commit-lock-file --flake /etc/nixos --upgrade
│ │ ├─102023 nix --extra-experimental-features "nix-command flakes" build "/etc/nixos#nixosConfigurations.\"argent\".config.system.build.toplevel" --recreate-lock-file --commit-lock-file --out-link /tmp/nixos-rebuild.87mkCn/result
│ │ └─103601 nix --extra-experimental-features "nix-command flakes" build "/etc/nixos#nixosConfigurations.\"argent\".config.system.build.toplevel" --recreate-lock-file --commit-lock-file --out-link /tmp/nixos-rebuild.87mkCn/result
For reference, my nixos-upgrade.service is being generated:
# /etc/systemd/system/nixos-upgrade.service
[Unit]
After=network-online.target
Description=NixOS Upgrade
Wants=network-online.target
X-StopOnRemoval=false
[Service]
Environment="HOME=/root"
Environment="LOCALE_ARCHIVE=/nix/store/c0v6ayqhwap6g8rdzibk9qqcljff1dji-glibc-locales-2.39-52/lib/locale/locale-archive"
Environment="NIX_PATH=nixpkgs=flake:nixpkgs:/nix/var/nix/profiles/per-user/root/channels"
Environment="Environment="PATH=/nix/store/mr63za5vkxj0yip6wj3j9lya2frdm3zc-coreutils-9.5/bin:/nix/store/wqbq6prmrhgm19qqdqm3ijjbap9x74cn-gnutar-1.35/bin:/nix/store/kpabg6z89nfry6frc3p5m3zfmk94zyqn-xz-5.4.7-bin/bin:/nix/store/jpqm5igl1gmahp7lxx8j1dy874zvirgm-gzip-1.13/bin:/nix/store/qi21rq4si1x72ab1wc06h0mrp485i6r6-git-minimal-2.44.1/bin:/nix/store/x6b4rr799djkf8a2abwf59fadcbyasc1-nix-2.18.8/bin:/nix/store/1m888byzaqaig6azrrfpmjdyhg>...
Environment="TZDIR=/nix/store/b7ipdvixma2jn8xv50kl2i55pb7ccb7q-tzdata-2024a/share/zoneinfo"
X-RestartIfChanged=false
ExecStart=/nix/store/gr8i06n09q7p5x58vfdiq3wx1x0lpkq6-unit-script-nixos-upgrade-start/bin/nixos-upgrade-start
Type=oneshot
And the script:
#!/nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash
set -e
/nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild boot --recreate-lock-file --commit-lock-file --flake /etc/nixos --upgrade
booted="$(/nix/store/mr63za5vkxj0yip6wj3j9lya2frdm3zc-coreutils-9.5/bin/readlink /run/booted-system/{initrd,kernel,kernel-modules})"
built="$(/nix/store/mr63za5vkxj0yip6wj3j9lya2frdm3zc-coreutils-9.5/bin/readlink /nix/var/nix/profiles/system/{initrd,kernel,kernel-modules})"
current_time="$(/nix/store/mr63za5vkxj0yip6wj3j9lya2frdm3zc-coreutils-9.5/bin/date +%H:%M)"
lower="05:00"
upper="07:30"
if [[ "${lower}" < "${upper}" ]]; then
if [[ "${current_time}" > "${lower}" ]] && \
[[ "${current_time}" < "${upper}" ]]; then
do_reboot="true"
else
do_reboot="false"
fi
else
# lower > upper, so we are crossing midnight (e.g. lower=23h, upper=6h)
# we want to reboot if cur > 23h or cur < 6h
if [[ "${current_time}" < "${upper}" ]] || \
[[ "${current_time}" > "${lower}" ]]; then
do_reboot="true"
else
do_reboot="false"
fi
fi
if [ "${booted}" = "${built}" ]; then
/nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild switch --recreate-lock-file --commit-lock-file --flake /etc/nixos
elif [ "${do_reboot}" != true ]; then
echo "Outside of configured reboot window, skipping."
else
/nix/store/nswmyag3qi9ars0mxw5lp8zm0wv5zxld-systemd-255.9/bin/shutdown -r +1
fi
There might be a config issue, but aside from auto upgrade, my systems are working perfect on 24.05, and I can freely rebuild and upgrade the system with the command run.
Also does this usually happens during boot?
It doesn't hang boot, but if the timer dictates autoupgrade needs to run, it can cause it to happen during boot. I will be able to log in however and see the upgrade service hung up. It more frequently happens when it runs off it's timer at 5am though.
I have my own service that checks git commit sign and then runs nixos-rebuild switch. It started hanging, too. So it just hangs on a clean nixos-rebuild call w/o hardening or whatever. Module can be found here.
If I restart it manually, then it exits as expected. But after boot it hangs.
Note that often "hanging" can often also just be waiting for a Type=oneshot
unit being restarted. If you are restarting a cron job that takes a few minutes to finish; nixos-rebuild switch will take a few minutes to finish. Systemd waits synchronously for Type=oneshot,dbus,notify,notify-reload,forking
units.
I've often seen people complain it's hanging only for there to be an active job in systemctl list-jobs
Note that often "hanging" can often also just be waiting for a
Type=oneshot
unit being restarted. If you are restarting a cron job that takes a few minutes to finish; nixos-rebuild switch will take a few minutes to finish. Systemd waits synchronously forType=oneshot,dbus,notify,notify-reload,forking
units.I've often seen people complain it's hanging only for there to be an active job in
systemctl list-jobs
Well my update service hangs for hours to be clear. Will hang for more if I don't kill it.
And it used to work just fine untill one of the flake updates.
># systemctl status autoupdate.service
● autoupdate.service - Signed system auto-update.
Loaded: loaded (/etc/systemd/system/autoupdate.service; linked; preset: ignored)
Active: activating (start) since Mon 2024-10-14 12:56:46 MSK; 3h 45min ago
Invocation: 9bd116cc97e44e579cc57821633d8301
TriggeredBy: ● autoupdate.timer
Main PID: 2274 (autoupdate-star)
IP: 2.1G in, 44.8M out
IO: 451M read, 86.6M written
Tasks: 5 (limit: 18399)
Memory: 14M (peak: 3.5G swap: 44K swap peak: 44K)
CPU: 1min 11.433s
CGroup: /system.slice/autoupdate.service
├─2274 /nix/store/izpf49b74i15pcr9708s3xdwyqs4jxwl-bash-5.2p32/bin/bash /nix/store/8daz9p2adkf851vd5wq82ainppfq52fp-unit-script-autoupdate-start/bin/autoupdate-start
├─2296 make switch
├─2300 /nix/store/516kai7nl5dxr792c0nzq0jp8m4zvxpi-bash-5.2p32/bin/bash /nix/store/sdg1fzgzwikr36p2zd36vfqmh281h1hl-nixos-rebuild/bin/nixos-rebuild switch --option eval-cache false --fallback --print-build-logs --verbose --flake .
├─2586 "ssh: /root/.ssh/nixbuilder@10.0.0.1:22143.socket [mux]"
└─4501 systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER= --collect --no-ask-password --pipe --quiet --same-dir --service-type=exec --unit=nixos-rebuild-switch-to-configuration --wait /nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/bin/switch-to-configuration switch
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/initrd' to '/nix/store/.links/0r3rrfjr157lb0v7sx200pj9fls7d0xnvawg8f1fi2mnwhk6iwbm'
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/append-initrd-secrets' to '/nix/store/.links/1pwmzwfn2qxijka08fmys8rdpsixprg9a50padl594j27hv7dsqs'
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/systemd' to '/nix/store/.links/104ahjvj74pq1hbvsyllwxh68x9mgd0qr9r6dfpj0mw9q68h9wrx'
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/extra-dependencies' to '/nix/store/.links/0ip26j2h11n1kgkz36rl4akv694yz65hr72q4kv4b3lxcbi65b3p'
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/kernel-modules' to '/nix/store/.links/1kavw6l9awd7zbb73gkjq581fh7z6rds1rd2chk3jmp3z45y0ig9'
Oct 14 12:57:41 laptop autoupdate-start[2327]: linking '/nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/kernel' to '/nix/store/.links/1viq6adh9qcc5zqnnz65p0v0w21z42n49wka5xn7khp3s0nfkvz5'
Oct 14 12:57:41 laptop autoupdate-start[2300]: $ nix-env -p /nix/var/nix/profiles/system --set /nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a
Oct 14 12:57:41 laptop autoupdate-start[2300]: $ systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER= --collect --no-ask-password --pipe --quiet --same-dir --service-type=exec --unit=nixos-rebuild-switch-to-configuration --wait true
Oct 14 12:57:41 laptop autoupdate-start[2300]: Using systemd-run to switch configuration.
Oct 14 12:57:41 laptop autoupdate-start[2300]: $ systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER= --collect --no-ask-password --pipe --quiet --same-dir --service-type=exec --unit=nixos-rebuild-switch-to-configuration --wait /nix/store/8av5wjkcli5x3d0dbpc00zw6qda4a49i-nixos-system-laptop-24.11.20241006.c31898a/bin/switch-to-configuration switch
As you can see, it has been "activating" for almost 4 hours now. Manual restarting solves the issue immediately.
What's noticeable tho, is that it drops memory usage from 3 GB to just 14 MB. So ig Nix is done by that point.
Could you see if
system.swtich.enableNG = false
fixes the issue? we recently replacedswitch-to-configuration
with a rust reimplementation and there might be bugs
@arianvp I get the error: The option 'system.swtich' does not exist. Definition values:
- system: `"x86_64-linux"`
- host os: `Linux 6.6.54, NixOS, 24.11 (Vicuna), 24.11.20241006.c31898a`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.18.8`
- nixpkgs: `/nix/store/rs4fjbnw4qx7ns2hzzrz2iz52va7vs5z-source`
I think I've encountered the same issue myself. On a flake based system config, I cannot complete a test rebuild because the transitive nixos-rebuild-switch-to-configuration
service does not exit, so stuff like Getty doesn't start and the system is in an unusable state. I am on unstable.
Describe the bug
On all three of my flake enabled systems, which are all running almost identical configs, I frequently log into the systems in the morning and run
systemctl status
to see both nixos-upgrade.service and nixos-rebuild-switch-to-configuration.service hung, and there is no related log output beyond the minute or two after they started. It only seems to hang and when it actually updates an input and has to change something.Killing the upgrade with
systemctl stop nixos-upgrade.service
clears the hung services, and doing a rebuild immediately after works with no issues. It's never seem to hung when an update is done manually and inputs changed withnixos-rebuild switch --recreate-lock-file --commit-lock-file --flake /etc/nixos
Steps To Reproduce
Steps to reproduce the behavior:
Expected behavior
The system is updated fully and the services are no longer running.
Additional context
Snippet of relevant
systemctl status
output:Relevant logs output:
Config of
system.autoUpgrade
Metadata
Add a :+1: reaction to issues you find important.