coreycothrum / meta-mender-const-conf

MIT License
2 stars 2 forks source link

question: journald persistent logs not working #3

Closed skinny closed 9 months ago

skinny commented 9 months ago

Hi again,

Using this layer I expected to be able to save journald logs onto my writeable data partition ;-)

First I made sure that my machine-id became stable and now I see that /var/log/journal is symlinked to /data/log/journal (as expected). The journal in that folder is actually written to but after a reboot this folder is cleared somehow and a new journal with only the entries from the current boot is created.

Do I have to do enable or disable something else to get this functionality working ?

Thanks again for your great work on this!

skinny commented 9 months ago

After some more testing it looks like I have some kind of dependency conflict/startup sequence issue between the mender-client-systemd-machine-id and the systemd-tmpfiles-setup.service

Commenting out the cleanup line in /usr/lib/systemd/system/systemd-tmpfiles-setup.service.d/systemd-tmpfiles-setup.service.conf makes the logging work like intended. I noticed that this service is created with Before=mender-client-systemd-machine-id.service which sounds odd to me but after changing it to After= the system doesn't boot correctly.

Preliminary conclusion: the cleanup line in the systemd-tmpfiles-setup.service is executed before the /etc/machine-id is properly populated by mender-client-systemd-machine-id.service

relevant part of my systemd tree :

image

skinny commented 9 months ago
[Service]
ExecStartPost=/usr/bin/mkdir      -p      /data/log/journal
ExecStartPost=/usr/bin/ln         -s      /data/log/journal /var/log/journal
ExecStartPost=/usr/bin/cat /etc/machine-id
ExecStartPost=/usr/bin/find                    /data/log/journal -mindepth 1 -maxdepth 1 -type d -not -name $(/usr/bin/cat /etc/machine-id)
ExecStartPost=/usr/bin/ls /data/log/journal -l
ExecStartPost=/usr/bin/find                    /data/log/journal -mindepth 1 -maxdepth 1 -type d -not -name $(/usr/bin/cat /etc/machine-id) -exec /usr/bin/rm -rf {} \;
ExecStartPost=/usr/bin/ls /data/log/journal -l
ExecStartPost=/usr/bin/journalctl --flush
ExecStartPost=/usr/bin/ls /data/log/journal -l
Jan 19 09:13:11 localhost cat[279]: dc367aebc01e4ef9a3a023658be5ae87
Jan 19 09:13:11 localhost find[280]: /data/log/journal/dc367aebc01e4ef9a3a023658be5ae87
Jan 19 09:13:11 localhost ls[281]: total 1
Jan 19 09:13:11 localhost ls[281]: drwxr-xr-x 2 root root 1024 Jan 19 08:57 dc367aebc01e4ef9a3a023658be5ae87
Jan 19 09:13:11 localhost ls[284]: total 0
Jan 19 09:13:11 localhost ls[287]: total 1
Jan 19 09:13:11 localhost ls[287]: drwxr-xr-x 2 root root 1024 Jan 19 09:13 dc367aebc01e4ef9a3a023658be5ae87
Jan 19 09:13:11 localhost systemd[1]: Finished Create Volatile Files and Directories.

The machine-id is actually present and valid but the folder is included in the find results nontheless. Running the find comment manually afterwards does not produce any output

coreycothrum commented 9 months ago

Let me know if this is correct:

  1. If you remove the find command from the systemd file, it should work as expected (persistently). (note: this find/cleanup is really only useful for systems that have ran w/o the persistent machine-id).
  2. After system is fully up, manually running the find command does nothing (i.e. doesn't delete that directory).

I'm I following that OK?

Are you still suspicious it's an ordering issue?

skinny commented 9 months ago

No, it's just that the weird thing is that executing the find command during boot lists the current valid machine-id as "to be removed" but when you execute that find command manually it does not.

I added the extra command to the service to see what values are present during that time and all looks to be correct. Maybe the $(cat /etc/machine-id) is not executed properly from within a systemd unit ?

(i just want to understand this behaviour ;-) )