NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.35k stars 14.31k forks source link

libvirtd service sometimes fail immediately after system update #9266

Closed bjornfor closed 8 years ago

bjornfor commented 9 years ago

I've seen several times that nixos-rebuild switch --upgrade (if the upgrade is "big enough") can cause libvirtd to fail.

Here is one from today (updating on the 14.12 branch):

$ sudo nixos-rebuild switch --upgrade
...
stopping the following units: accounts-daemon.service, alsa-store.service, apcupsd.service, avahi-daemon.service, collectd.service, cpufreq.service, cron.service, cupsd.service, dbus-org.bluez.service, fail2ban.service, idmapd.service, kmod-static-nodes.service, libvirtd.service, lighttpd.service, lttng-sessiond.service, mountd.service, munin-node.service, mysql.service, network-addresses-vboxnet0.service, network-link-vboxnet0.service, network-local-commands.service, network-manager.service, network-setup.service, nfsd.service, nix-daemon.service, nscd.service, ntpd.service, phpfpm.service, postfix.service, rngd.service, rpcbind.service, rtkit-daemon.service, samba-nmbd.service, samba-smbd.service, samba-winbindd.service, statd.service, systemd-fsck@dev-disk-by\x2dlabel-3tb.service, systemd-logind.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-settle.service, systemd-udev-trigger.service, systemd-udevd.service, systemd-vconsole-setup.service, transmission.service, udisks2.service, upower.service, vboxnet0.service
Warning: Stopping nix-daemon.service, but it can still be activated by:
  nix-daemon.socket
Warning: Stopping systemd-udevd.service, but it can still be activated by:
  systemd-udevd-control.socket
  systemd-udevd-kernel.socket
NOT restarting the following changed units: display-manager.service, docker.service, getty@tty1.service, libvirt-guests.service, systemd-backlight@backlight:acpi_video0.service, systemd-journal-flush.service, systemd-journald.service, systemd-random-seed.service, systemd-remount-fs.service, systemd-rfkill@rfkill0.service, systemd-rfkill@rfkill1.service, systemd-tmpfiles-setup.service, systemd-update-utmp.service, systemd-user-sessions.service, user@1000.service
activating the configuration...
setting up /etc...
regenerating udev hardware database...
restarting systemd...
reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, sys-fs-fuse-connections.mount, sys-kernel-debug.mount
restarting the following units: polkit.service, sshd.service
starting the following units: accounts-daemon.service, alsa-store.service, apcupsd.service, avahi-daemon.service, collectd.service, cpufreq.service, cron.service, cupsd.service, dbus-org.bluez.service, fail2ban.service, idmapd.service, kmod-static-nodes.service, libvirtd.service, lighttpd.service, lttng-sessiond.service, mountd.service, munin-node.service, mysql.service, network-addresses-vboxnet0.service, network-link-vboxnet0.service, network-local-commands.service, network-manager.service, network-setup.service, nfsd.service, nix-daemon.service, nscd.service, ntpd.service, phpfpm.service, postfix.service, rngd.service, rpcbind.service, rtkit-daemon.service, samba-nmbd.service, samba-smbd.service, samba-winbindd.service, statd.service, systemd-fsck@dev-disk-by\x2dlabel-3tb.service, systemd-logind.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-settle.service, systemd-udev-trigger.service, systemd-udevd.service, systemd-vconsole-setup.service, transmission.service, udisks2.service, upower.service, vboxnet0.service
Job for libvirtd.service failed. See "systemctl status libvirtd.service" and "journalctl -xe" for details.
warning: the following units failed: libvirtd.service

● libvirtd.service - Libvirt Virtual Machine Management Daemon
   Loaded: loaded (/nix/store/ki79vzinr574xapxaqcnd651jsrbaxsq-unit-libvirtd.service/libvirtd.service)
   Active: failed (Result: exit-code) since Sat 2015-08-15 22:22:41 CEST; 21ms ago
  Process: 5234 ExecStartPost=/nix/store/xsza8xnzsvq3cf39a10pgzy3wdp60r71-unit-script/bin/libvirtd-post-start (code=exited, status=1/FAILURE)
  Process: 4992 ExecStart=libvirtd --config /nix/store/gp47p1sd95xsqz1yswymm99zrazphf6g-libvirtd.conf --daemon (code=exited, status=0/SUCCESS)
  Process: 4455 ExecStartPre=/nix/store/3dqlp3smmqs4jdjnm6na9wq92kc39n2f-unit-script/bin/libvirtd-pre-start (code=exited, status=0/SUCCESS)
 Main PID: 1984 (code=exited, status=0/SUCCESS)

Aug 15 22:22:39 mini libvirtd-post-start[5234]: error: failed to connect to the hypervisor
Aug 15 22:22:39 mini libvirtd-post-start[5234]: error: no valid connection
Aug 15 22:22:39 mini libvirtd-post-start[5234]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused
Aug 15 22:22:40 mini libvirtd-post-start[5234]: error: failed to connect to the hypervisor
Aug 15 22:22:40 mini libvirtd-post-start[5234]: error: no valid connection
Aug 15 22:22:40 mini libvirtd-post-start[5234]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused
Aug 15 22:22:41 mini systemd[1]: libvirtd.service: control process exited, code=exited status=1
Aug 15 22:22:41 mini systemd[1]: Failed to start Libvirt Virtual Machine Management Daemon.
Aug 15 22:22:41 mini systemd[1]: Unit libvirtd.service entered failed state.
Aug 15 22:22:41 mini systemd[1]: libvirtd.service failed.
warning: error(s) occured while switching to the new configuration

This one fixed itself with one additional sudo systemctl restart libvirtd. But I've also had this error where this manual workaround did not work, and I've simply rebooted the machine :-(

lucabrunox commented 9 years ago

It's possible that libvirtd requires some additional time for cleaning up after stop, hence not even your manual restart worked because it was too fast. That's just supposition though.

bjornfor commented 9 years ago

It might be. But normal service restarts are generally quick (I can do sudo systemctl restart libvirtd in <1 second). I think the issue only appears when libvirtd starts running from a new/different store path.

bjornfor commented 9 years ago

But I'll keep an eye on it (the time).

lucabrunox commented 9 years ago

On Sunday, August 16, 2015, Bjørn Forsman notifications@github.com wrote:

It might be. But normal service restarts are generally quick (I can do sudo systemctl restart libvirtd in <1 second).

Systemd is quite async. Doing systemcltl start may return immediately but still do something in the background for several seconds.

NixOS Linux http://nixos.org

bjornfor commented 9 years ago

I concidered that. But there is a --no-block option for systemctl, indicating that the default is to block. Using --no-block reduces a systemctl restart libvirtd operation from ~0.5 seconds down to 20 ms. Further, the journal output for libvirtd service while restarting it shows that the whole thing completes within one second:

Aug 16 11:03:33 mini systemd[1]: Stopping Libvirt Virtual Machine Management Daemon...
Aug 16 11:03:33 mini systemd[1]: Starting Libvirt Virtual Machine Management Daemon...
Aug 16 11:03:33 mini dnsmasq[5287]: read /etc/hosts - 13862 addresses
Aug 16 11:03:33 mini dnsmasq[5287]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Aug 16 11:03:33 mini dnsmasq-dhcp[5287]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Aug 16 11:03:33 mini systemd[1]: Started Libvirt Virtual Machine Management Daemon.

But I do find it a bit odd that restarting libvirtd takes less than one second. I would have guessed it took longer.

Wait... there is also the "libvirt-guests" service involved here. (That service probably takes longer to start.) Looking at the "nixos-rebuild switch" output from above, I see NOT restarting the following changed units: ... libvirt-guests.service. That might be a clue.

Profpatsch commented 8 years ago

(triage) does this still happen?

bjornfor commented 8 years ago

Don't know, haven't paid much attention to the libvirtd service lately. I'll reopen if I notice it again.