astro / microvm.nix

NixOS MicroVMs
https://astro.github.io/microvm.nix/
MIT License
1.24k stars 93 forks source link

microvm@%i.service restart loop with cloud-hypervisor #265

Closed vdbe closed 4 days ago

vdbe commented 1 month ago

When using the cloud-hypervisor the microvm@%i.service keeps restarting. The vm boots and works normally when started via microvm -r or current/bin/microvm-run (needs restarting microvm-virtiofsd when using virtiofsd).

Tested vms with qemu as hypervisor and no other changes and that works as expected.

The microvm@%i.service seems to be stuck on activating (start): Active: activating (start) since Sat 2024-08-10 15:48:56 UTC; 48s ago but don't directly see something wrong with the sockets.

Made a flake to isolate the issue but can't find the problem, tested on hardware and with nixos-rebuild build-vm ... both had the same results. https://github.com/vdbe/microvm-example

astro commented 1 month ago

Are there any error msgs in journalctl -eu microvm@\*? Try boot.kernelParams = [ "verbose" ];

Are you able to git bisect the breaking change in microvm.nix?

vdbe commented 1 month ago

Did not use git bisect but commit a439229a1af9e0fae3b3b21619c1983901a41bf7 is first commit to break (9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3 works). Did not see any relevant error messages in journalctl -eu microvm@\*.

Output from journalctl -b 0 -eu microvm@cloud-hypervisor-default boot.kernelParams = [ "verbose" ]; for host & guest .force.log is with kernelParams = mkForce [ "verbose" ]; for host and kernelParams = mkForce [ "root=fstab" "verbose" ]; on guest to get rid of "loglevel=4":

Dan-Theriault commented 1 month ago

I'm having the same issue (cloud-hypervisor looping mysteriously, fixed by changing to QEMU). My VMs were also unable to boot with crosvm (unsure if related, just what I tried before QEMU).

vdbe commented 1 month ago

crossvm doesn't have this issue for me just cloud-hypervisor (couldn't build/test alioth) image

vdbe commented 1 week ago

After seeing #268 and d52082cc2668b8cd788e3133526c8693ee71f6a5 and tested again with nixos-24.05 which has systemd version 255.9 and it worked perfectly.

d52082cc2668b8cd788e3133526c8693ee71f6a5 however does not fix the issue (which I think was the goal) because the systemd service still has Type=notify. image

from https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/nixos-modules/host/default.nix#L108-L111

I guess https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/lib/runners/cloud-hypervisor.nix#L122 needs to be supportsNotifySocket = doNotify.

astro commented 4 days ago

Sorry for that. You're right. 0fb06e0629 fixes it.