Open poquirion opened 4 months ago
I have restarted two machines today and on one of them all was good and on the other one, /var/lock/subsys/
was missing but /var/run/munge/
was there. It really looks like a race condition.
I research this a bit.
/var/lock/subsys
is considered a legacy temporary folder. It is created by the systemd via systemd-tmpfiles-setup.service
. The instructions to create the folder are in : /usr/lib/tmpfiles.d/legacy.conf
.
Next time it happens, if you could look at the journalctl of the tmpfiles service and paste its content here, that would be helpful:
journalctl -u systemd-tmpfiles-setup.service
Also provide the journal of iptables, so we can look at the timestamp and determine if systemd-tmpfiles ran after iptables tried to start.
The creation of /var/run/munge/
is also the responsibility of systemd-tmpfiles-setup.service
. The folder to created is defined in /usr/lib/tmpfiles.d/munge.conf
.
munge service file does not explictly state systemd-tmpfiles-setup as a service that needs to be started before munge is started
Before=multi-user.target shutdown.target
After=system.slice systemd-journald.socket sysinit.target basic.target time-sync.target network.target
So there is a potential race condition as you stated.
sysinit.target
has the following dependency:
After=proc-sys-fs-binfmt_misc.automount [...] systemd-tmpfiles-setup-dev.service [...]dev-mqueue.mount
Both munge and iptables depends on sysinit.target, so in theory /var/run/munge
and /var/lock/subsys
have to exist before sysinit.target
is executed.
/var/run/munge/
and/var/lock/subsys/
where missing after a soft restart was triggerd from the openstack interfce, preventingiptables.service
andmunge.service
to restart properly.