astro / microvm.nix

NixOS MicroVMs
https://astro.github.io/microvm.nix/
MIT License
1.24k stars 93 forks source link

Virtiofs: long delays in restarting/stopping a VM which uses virtiofs #262

Open brianmcgillion opened 1 month ago

brianmcgillion commented 1 month ago

These commits introduce a substantial delay in rebooting or shutting down a VM. I am not sure commit exactly which, as the first one fails to start the virtiofs with a permission denied error. so tested together.

https://github.com/astro/microvm.nix/commit/36e261aa2c15a4020663cef87905f436b15305f7 https://github.com/astro/microvm.nix/commit/a50d1bf0d4a2f23ce2ffc503d65c9423a8a268e1

latest working commit is here: https://github.com/astro/microvm.nix/commit/bb0c6d867bae270288197e01449bba74ac0d381e

As can be seen in the below log for a reboot of the vm sudo systemctl restart microvm@admin-vm.service the shutdown takes 1.5 minutes before the graceful shutdown eventually times out.


...
Aug 05 19:28:50 ghaf-host microvm@admin-vm[1460]: [  OK  ] Reached target System Reboot.
Aug 05 19:28:50 ghaf-host microvm@admin-vm[1460]: microvm@admin-vm: terminating on signal 15 from pid 1 (<unknown process>)
Aug 05 19:28:50 ghaf-host supervisord[1189]: virtiofsd-log-store [2024-08-05T15:28:50Z INFO  virtiofsd] Client disconnected, shutting down
Aug 05 19:28:50 ghaf-host supervisord[1189]: virtiofsd-ro-store [2024-08-05T15:28:50Z INFO  virtiofsd] Client disconnected, shutting down
Aug 05 19:28:50 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:28:50,856 INFO exited: virtiofsd-log-store (exit status 0; expected)
Aug 05 19:28:50 ghaf-host systemd[1]: microvm@admin-vm.service: Deactivated successfully.
Aug 05 19:28:50 ghaf-host systemd[1]: Stopping VirtioFS daemons for MicroVM 'admin-vm'...
Aug 05 19:30:21 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Stopping timed out. Terminating.
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:30:21,068 WARN exited: notify (terminated by SIGTERM; not expected)
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1189]: 2024-08-05 19:30:21,069 WARN received SIGTERM indicating exit request
Aug 05 19:30:21 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Control process exited, code=killed, status=15/TERM
Aug 05 19:30:21 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Failed with result 'timeout'.
Aug 05 19:30:21 ghaf-host systemd[1]: Stopped VirtioFS daemons for MicroVM 'admin-vm'.
Aug 05 19:30:21 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Consumed 12.210s CPU time, 49.8M memory peak.

after which it starts to boot again as normal.

Aug 05 19:30:21 ghaf-host systemd[1]: Starting VirtioFS daemons for MicroVM 'admin-vm'...
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:21,294 INFO Set uid to user 0 succeeded
Aug 05 19:30:21 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:21,295 INFO supervisord started with pid 1845
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,299 INFO spawned: 'notify' with pid 1860
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,301 INFO spawned: 'virtiofsd-log-store' with pid 1861
Aug 05 19:30:22 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:22,303 INFO spawned: 'virtiofsd-ro-store' with pid 1862
Aug 05 19:30:22 ghaf-host supervisord[1845]: virtiofsd-log-store [2024-08-05T15:30:22Z INFO  virtiofsd] Waiting for vhost-user socket connection...
Aug 05 19:30:22 ghaf-host supervisord[1845]: virtiofsd-ro-store [2024-08-05T15:30:22Z INFO  virtiofsd] Waiting for vhost-user socket connection...
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,341 INFO success: notify entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,342 INFO success: virtiofsd-log-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host microvm-virtiofsd@admin-vm[1845]: 2024-08-05 19:30:23,342 INFO success: virtiofsd-ro-store entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 05 19:30:23 ghaf-host systemd[1]: Started VirtioFS daemons for MicroVM 'admin-vm'.
astro commented 1 month ago

@SuperSandro2000 Any ideas?

SuperSandro2000 commented 1 month ago

Maybe supervisord is not stopping after all processes it supervisors where told to stop. Maybe we can convince it with another signal?

brianmcgillion commented 1 month ago

Doing some more investigations and logging, I can see the following failures are occurring in the shutdown phase between supervisord and the virtiofsd processes

Aug 09 11:10:45 ghaf-host supervisord[1226]: virtiofsd-log-store [2024-08-09T07:10:45Z INFO  virtiofsd] Client disconnected, shutting down
Aug 09 11:10:45 ghaf-host supervisord[1226]: virtiofsd-ro-store [2024-08-09T07:10:45Z INFO  virtiofsd] Client disconnected, shutting down
Aug 09 11:10:45 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:45,931 INFO exited: virtiofsd-log-store (exit status 0; expected)
Aug 09 11:10:45 ghaf-host systemd[1]: microvm@admin-vm.service: Deactivated successfully.
Aug 09 11:10:45 ghaf-host systemd[1]: Stopped MicroVM 'admin-vm'.
Aug 09 11:10:45 ghaf-host systemd[1]: microvm@admin-vm.service: Consumed 24.634s CPU time, 478.8M memory peak.

Aug 09 11:10:46 ghaf-host systemd[1]: Stopping VirtioFS daemons for MicroVM 'admin-vm'...
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1783]: ERROR: http://localhost:9001 refused connection (already shut down?)
Aug 09 11:10:46 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Control process exited, code=exited, status=1/FAILURE
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN exited: notify (terminated by SIGTERM; not expected)
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN exited: virtiofsd-ro-store (exit status 1; not expected)
Aug 09 11:10:46 ghaf-host microvm-virtiofsd@admin-vm[1226]: 2024-08-09 11:10:46,166 WARN received SIGTERM indicating exit request
Aug 09 11:10:46 ghaf-host systemd[1]: microvm-virtiofsd@admin-vm.service: Failed with result 'exit-code'.
Aug 09 11:10:46 ghaf-host systemd[1]: Stopped VirtioFS daemons for MicroVM 'admin-vm'.