astro / microvm.nix

NixOS MicroVMs
https://astro.github.io/microvm.nix/
MIT License
1.42k stars 103 forks source link

Starting Guests fails due to Problems with virtiofsd #260

Closed keks closed 3 months ago

keks commented 3 months ago

Somehow I am having trouble getting microvm.nix to work. My config is here: https://github.com/keks/broken-nixos-microvm

The flake has three system configurations:

On both protoserver and nucsrv I have a clone of the flake's git repo checked out at /root/nix-systems.

The reason I switched from protoserver to nucsrv was that I believed this was some nested virtualization problem, so I switched to using a real computer. However, the exact same problem persists. Here is an excerpt from journalctl:

journalctl --since today ``` ... Jul 30 00:00:12 nucsrv systemd[1]: microvm@guest0.service: Scheduled restart job, restart counter is at 47947. Jul 30 00:00:12 nucsrv systemd[1]: Stopped MicroVM 'guest0'. Jul 30 00:00:12 nucsrv systemd[1]: Setup MicroVM 'guest0' MACVTAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/macvtap-interfaces). Jul 30 00:00:12 nucsrv systemd[1]: Setup MicroVM 'guest0' devices for passthrough was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/pci-devices). Jul 30 00:00:12 nucsrv systemd[1]: Setup MicroVM 'guest0' TAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/tap-interfaces). Jul 30 00:00:12 nucsrv systemd[1]: Stopping VirtioFS daemons for MicroVM 'guest0'... Jul 30 00:00:12 nucsrv microvm-virtiofsd@guest0[1536879]: 2024-07-30 00:00:12,662 WARN received SIGTERM indicating exit request Jul 30 00:00:12 nucsrv microvm-virtiofsd@guest0[1536879]: 2024-07-30 00:00:12,663 INFO waiting for virtiofsd-ro-store to die Jul 30 00:00:12 nucsrv microvm-virtiofsd@guest0[1536879]: 2024-07-30 00:00:12,663 WARN stopped: virtiofsd-ro-store (exit status 1) Jul 30 00:00:12 nucsrv systemd[1]: microvm-virtiofsd@guest0.service: Deactivated successfully. Jul 30 00:00:12 nucsrv systemd[1]: Stopped VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:12 nucsrv systemd[1]: Started VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:12 nucsrv systemd[1]: Starting MicroVM 'guest0'... Jul 30 00:00:12 nucsrv systemd[1]: Started MicroVM 'guest0'. Jul 30 00:00:12 nucsrv microvm@guest0[1536922]: microvm@guest0: -chardev socket,id=fs0,path=guest0-virtiofs-ro-store.sock: Failed to connect to 'guest0-virtiofs-ro-store.sock': Connection refused Jul 30 00:00:12 nucsrv systemd[1]: microvm@guest0.service: Main process exited, code=exited, status=1/FAILURE Jul 30 00:00:12 nucsrv systemd[1]: microvm@guest0.service: Failed with result 'exit-code'. Jul 30 00:00:12 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:12,878 INFO Set uid to user 0 succeeded Jul 30 00:00:12 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:12,880 INFO supervisord started with pid 1536910 Jul 30 00:00:13 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:13,883 INFO spawned: 'virtiofsd-ro-store' with pid 1536928 Jul 30 00:00:14 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:14,891 INFO success: virtiofsd-ro-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Jul 30 00:00:17 nucsrv systemd[1]: microvm@guest0.service: Scheduled restart job, restart counter is at 47948. Jul 30 00:00:17 nucsrv systemd[1]: Stopped MicroVM 'guest0'. Jul 30 00:00:17 nucsrv systemd[1]: Setup MicroVM 'guest0' MACVTAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/macvtap-interfaces). Jul 30 00:00:17 nucsrv systemd[1]: Setup MicroVM 'guest0' devices for passthrough was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/pci-devices). Jul 30 00:00:17 nucsrv systemd[1]: Setup MicroVM 'guest0' TAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/tap-interfaces). Jul 30 00:00:17 nucsrv systemd[1]: Stopping VirtioFS daemons for MicroVM 'guest0'... Jul 30 00:00:17 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:17,895 WARN received SIGTERM indicating exit request Jul 30 00:00:17 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:17,895 INFO waiting for virtiofsd-ro-store to die Jul 30 00:00:17 nucsrv microvm-virtiofsd@guest0[1536910]: 2024-07-30 00:00:17,896 WARN stopped: virtiofsd-ro-store (exit status 1) Jul 30 00:00:17 nucsrv systemd[1]: microvm-virtiofsd@guest0.service: Deactivated successfully. Jul 30 00:00:17 nucsrv systemd[1]: Stopped VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:17 nucsrv systemd[1]: Started VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:17 nucsrv systemd[1]: Starting MicroVM 'guest0'... Jul 30 00:00:17 nucsrv systemd[1]: Started MicroVM 'guest0'. Jul 30 00:00:17 nucsrv microvm@guest0[1536953]: microvm@guest0: -chardev socket,id=fs0,path=guest0-virtiofs-ro-store.sock: Failed to connect to 'guest0-virtiofs-ro-store.sock': Connection refused Jul 30 00:00:17 nucsrv systemd[1]: microvm@guest0.service: Main process exited, code=exited, status=1/FAILURE Jul 30 00:00:18 nucsrv systemd[1]: microvm@guest0.service: Failed with result 'exit-code'. Jul 30 00:00:18 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:18,110 INFO Set uid to user 0 succeeded Jul 30 00:00:18 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:18,112 INFO supervisord started with pid 1536941 Jul 30 00:00:19 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:19,115 INFO spawned: 'virtiofsd-ro-store' with pid 1536959 Jul 30 00:00:20 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:20,123 INFO success: virtiofsd-ro-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Jul 30 00:00:23 nucsrv systemd[1]: microvm@guest0.service: Scheduled restart job, restart counter is at 47949. Jul 30 00:00:23 nucsrv systemd[1]: Stopped MicroVM 'guest0'. Jul 30 00:00:23 nucsrv systemd[1]: Setup MicroVM 'guest0' MACVTAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/macvtap-interfaces). Jul 30 00:00:23 nucsrv systemd[1]: Setup MicroVM 'guest0' devices for passthrough was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/pci-devices). Jul 30 00:00:23 nucsrv systemd[1]: Setup MicroVM 'guest0' TAP interfaces was skipped because of an unmet condition check (ConditionPathExists=/var/lib/microvms/guest0/current/share/microvm/tap-interfaces). Jul 30 00:00:23 nucsrv systemd[1]: Stopping VirtioFS daemons for MicroVM 'guest0'... Jul 30 00:00:24 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:24,129 WARN received SIGTERM indicating exit request Jul 30 00:00:24 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:24,129 INFO waiting for virtiofsd-ro-store to die Jul 30 00:00:24 nucsrv microvm-virtiofsd@guest0[1536941]: 2024-07-30 00:00:24,130 WARN stopped: virtiofsd-ro-store (exit status 1) Jul 30 00:00:24 nucsrv systemd[1]: microvm-virtiofsd@guest0.service: Deactivated successfully. Jul 30 00:00:24 nucsrv systemd[1]: Stopped VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:24 nucsrv systemd[1]: Started VirtioFS daemons for MicroVM 'guest0'. Jul 30 00:00:24 nucsrv systemd[1]: Starting MicroVM 'guest0'... Jul 30 00:00:24 nucsrv systemd[1]: Started MicroVM 'guest0'. Jul 30 00:00:24 nucsrv microvm@guest0[1536984]: microvm@guest0: -chardev socket,id=fs0,path=guest0-virtiofs-ro-store.sock: Failed to connect to 'guest0-virtiofs-ro-store.sock': Connection refused Jul 30 00:00:24 nucsrv systemd[1]: microvm@guest0.service: Main process exited, code=exited, status=1/FAILURE Jul 30 00:00:24 nucsrv systemd[1]: microvm@guest0.service: Failed with result 'exit-code'. Jul 30 00:00:24 nucsrv microvm-virtiofsd@guest0[1536972]: 2024-07-30 00:00:24,347 INFO Set uid to user 0 succeeded Jul 30 00:00:24 nucsrv microvm-virtiofsd@guest0[1536972]: 2024-07-30 00:00:24,349 INFO supervisord started with pid 1536972 Jul 30 00:00:25 nucsrv microvm-virtiofsd@guest0[1536972]: 2024-07-30 00:00:25,352 INFO spawned: 'virtiofsd-ro-store' with pid 1536990 Jul 30 00:00:26 nucsrv microvm-virtiofsd@guest0[1536972]: 2024-07-30 00:00:26,360 INFO success: virtiofsd-ro-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) Jul 30 00:00:29 nucsrv systemd[1]: microvm@guest0.service: Scheduled restart job, restart counter is at 47950. ... ``` And so forth.

I really don't get what the problem is, is there something wrong with the configuration? I don't see any real difference to what is shown in the guide.

Any help would be much appreciated!

Bert-Proesmans commented 3 months ago

what's the output of command ls -la /var/lib/microvms/guest0 on you hypervisor host? There might be no files with name format guest0-virtiofs-*.sock. If it's (somehow) a permission issue you could remove the unix socket and restart your vm; sudo rm -f /var/lib/microvms/guest0/guest0-virtiofs-ro-store.sock

keks commented 3 months ago
# ls -l /var/lib/microvms/guest0/
total 5803
lrwxrwxrwx 1 microvm kvm        63 Jul 25 22:59 current -> /nix/store/9c556p94mgdcsg7f6jd5j2rd0b3bfbkp-microvm-qemu-guest0
-rw-r--r-- 1 microvm kvm        18 Jul 25 22:59 flake
srwxr-xr-x 1 microvm kvm         0 Jul 31 10:13 guest0.sock
srwxrwx--- 1 root    kvm         0 Jul 31 10:13 guest0-virtiofs-ro-store.sock
-rw------- 1 root    root        8 Jul 31 10:13 guest0-virtiofs-ro-store.sock.pid
-rw-r--r-- 1 root    root 37768769 Jul 31 10:13 supervisord.log
-rw-r--r-- 1 root    root        8 Jul 31 10:13 supervisord.pid

One thing to note is that in the logs it seems like something is SIGTERMing virtiofsd, which would explain why we get the connection refused: The server process doesn't listen anymore, it's terminated. Every time I checked for the pid in /var/lib/microvm/guest0/guest0-virtiofs-ro-store.sock.pid, no such process existed. However, when I stopped the microvm@guest0 service and started the virtiofsd@guest0 service, it stayed up, and there was a process with that pid.

I don't know what makes this happen. I provided the config to make this issue reproducible, in case that helps.

keks commented 3 months ago

Oh, and regarding the question whether it may be a permissions issue: I don't think so. man 7 unix says

ERRORS
        (snip)

        ECONNREFUSED
              The  remote address specified by connect(2) was not a listening socket.  This error can
              also occur if the target pathname is not a socket.
astro commented 3 months ago

From the journalctl output it seems to me that the virtiofsd come up only after the VMM is started.

Could you try adding the following to your host's NixOS config?

systemd.services."microvm@".after = [ "microvm-virtiofsd@%i.service" ];

Sorry, systemd dependencies are hard. If this solves your problem, I'll add it to the host module.

keks commented 3 months ago

I think that fixed it, thank you very much!

keks commented 3 months ago

Oops, maybe you want to keep this open until the PR is in? Sorry!

astro commented 3 months ago

Thank you for finding and testing.