astro / microvm.nix

NixOS MicroVMs
https://astro.github.io/microvm.nix/
MIT License
1.38k stars 101 forks source link

Failed to connect to 'journal.sock': No such file or directory #200

Closed pizzapim closed 8 months ago

pizzapim commented 9 months ago

I am trying to follow the faq about to centralize logging with journald, but I am running into an issue. Namely, I get the following error:

qemu-system-x86_64: -chardev socket,id=fs1,path=journal.sock: Failed to connect to 'journal.sock': No such file or directory

When I use the following configuration from the faq:

microvm.shares = [{
  source = "/var/lib/microvms/${config.networking.hostName}/journal";
  mountPoint = "/var/log/journal";
  tag = "journal";
  proto = "virtiofs";
  socket = "journal.sock";
}];

The problem seems that journal.sock is not present in /var/lib/microvms/maestro2/current/share/microvm/virtiofs. Curiously, I am also using the ro-store configuration from the documentation which works and does show up here:

[root@lewis:/var/lib/microvms/maestro2/current/share/microvm/virtiofs]# ls
ro-store

The socket is however created here:

[root@lewis:/var/lib/microvms/maestro2]# ls -alh journal.sock
srwxrwx--- 1 root kvm 0  5 feb 20:20 journal.sock
astro commented 9 months ago

Are you certain that you rebuilt the microvm into /var/lib/microvms/maestro2/current? Because share/microvm/virtiofs/journal/ ought to exist with that config.

pizzapim commented 9 months ago

I debugged further, after disabling the automatic rollback of my deployment tool deploy-rs. The systemd service crashes again because of the same issue, but after restarting it works! So it might be that nobody noticed this behaviour if they don't use a rollback system.

pizzapim commented 9 months ago

Here is some more context in the logs when switching to a new generation ():

feb 05 21:18:58 lewis microvm@maestro2[356226]: [  OK  ] Stopped User Login Management.
feb 05 21:18:58 lewis microvm@maestro2[356226]: [142B blob data]
feb 05 21:18:58 lewis microvm@maestro2[356226]: [
feb 05 21:18:58 lewis systemd[1]: microvm@maestro2.service: Deactivated successfully.
feb 05 21:18:58 lewis systemd[1]: Stopped MicroVM 'maestro2'.
feb 05 21:18:58 lewis systemd[1]: microvm@maestro2.service: Consumed 1min 19.289s CPU time, no IP traffic.
feb 05 21:19:00 lewis systemd[1]: Starting MicroVM 'maestro2'...
feb 05 21:19:01 lewis systemd[1]: Started MicroVM 'maestro2'.
feb 05 21:19:01 lewis microvm@maestro2[357297]: qemu-system-x86_64: -chardev socket,id=fs1,path=journal.sock: Failed to connect to 'journal.sock': Connection refused
feb 05 21:19:01 lewis systemd[1]: microvm@maestro2.service: Main process exited, code=exited, status=1/FAILURE
feb 05 21:19:01 lewis systemd[1]: microvm@maestro2.service: Failed with result 'exit-code'.
feb 05 21:19:06 lewis systemd[1]: microvm@maestro2.service: Scheduled restart job, restart counter is at 1.
feb 05 21:19:06 lewis systemd[1]: Stopped MicroVM 'maestro2'.
feb 05 21:19:06 lewis systemd[1]: Starting MicroVM 'maestro2'...
feb 05 21:19:06 lewis systemd[1]: Started MicroVM 'maestro2'.
feb 05 21:19:06 lewis microvm@maestro2[357522]: [73B blob data]

The service is restarted, fails, restarts again this time succeeding.

pizzapim commented 9 months ago

After successfully deploying the generation with the share like above, then deploying a new generation, the error does not occur anymore. So only when deploying a generation with the share on the system currently without the share does the error occur. So temporarily disabling the rollback feature is a workaround here. I might debug tomorrow further.

astro commented 8 months ago

Is there anything related to deploy-rs that we should change in microvm.nix, or add to the documentation?

pizzapim commented 8 months ago

What I now temporarily do when I encounter this issue, is the following:

nix run nixpkgs#deploy-rs -- -k --targets .#<host> --auto-rollback false --magic-rollback false

Maybe you can add this as a caveat about deploy-rs to the documentation?

astro commented 8 months ago

Great that it works for you now.