ComputeCanada / puppet-magic_castle

Puppet Environment repo for Magic Castle - https://github.com/ComputeCanada/magic_castle
MIT License
13 stars 21 forks source link

Missing folder on restart in release 13.5.0 #368

Open poquirion opened 4 months ago

poquirion commented 4 months ago

/var/run/munge/ and /var/lock/subsys/ where missing after a soft restart was triggerd from the openstack interfce, preventing iptables.service and munge.service to restart properly.

poquirion commented 4 months ago

I have restarted two machines today and on one of them all was good and on the other one, /var/lock/subsys/ was missing but /var/run/munge/ was there. It really looks like a race condition.

cmd-ntrf commented 4 months ago

I research this a bit.

/var/lock/subsys is considered a legacy temporary folder. It is created by the systemd via systemd-tmpfiles-setup.service. The instructions to create the folder are in : /usr/lib/tmpfiles.d/legacy.conf.

Next time it happens, if you could look at the journalctl of the tmpfiles service and paste its content here, that would be helpful:

journalctl -u systemd-tmpfiles-setup.service

Also provide the journal of iptables, so we can look at the timestamp and determine if systemd-tmpfiles ran after iptables tried to start.

cmd-ntrf commented 4 months ago

The creation of /var/run/munge/ is also the responsibility of systemd-tmpfiles-setup.service. The folder to created is defined in /usr/lib/tmpfiles.d/munge.conf.

munge service file does not explictly state systemd-tmpfiles-setup as a service that needs to be started before munge is started

Before=multi-user.target shutdown.target
After=system.slice systemd-journald.socket sysinit.target basic.target time-sync.target network.target

So there is a potential race condition as you stated.

cmd-ntrf commented 4 months ago

sysinit.target has the following dependency:

After=proc-sys-fs-binfmt_misc.automount [...] systemd-tmpfiles-setup-dev.service [...]dev-mqueue.mount

Both munge and iptables depends on sysinit.target, so in theory /var/run/munge and /var/lock/subsys have to exist before sysinit.target is executed.