design journal persistence for early provisioning

cgwalters commented 3 years ago

A general philosophy of Ignition is that we have the system fully configured before switching into the real root and running user code. For example, using Ignition kernel argument support we run systemctl reboot from the initramfs.

However, this means that all logs from this early provisioning time are lost because systemd-journal-flush.service only runs when successfully switching to the real root.

This is likely something to take to upstream systemd, but basically I'd propose as a strawman that we configure journald to persist to /boot/journal or so during this early provisioning phase, and then have it switch over.

jlebon commented 3 years ago

Is this RFE driven by a specific instance you have in mind where the Ignition kargs reboot nuked system logs you cared about?

cgwalters commented 3 years ago

Is this RFE driven by a specific instance you have in mind where the Ignition kargs reboot nuked system logs you cared about?

Not me specifically, but from an internal chat where another team member was wondering how to find the logs from coreos-kargs-reboot.service.

dustymabe commented 3 years ago

Sounds reasonable to me. Do you want to discuss this at the weekly meeting?

HuijingHei commented 3 years ago

Is this RFE driven by a specific instance you have in mind where the Ignition kargs reboot nuked system logs you cared about?

Not me specifically, but from an internal chat where another team member was wondering how to find the logs from coreos-kargs-reboot.service.

Thanks! Actually it is me who ask the question. Because I want to add auto case to check for rhcos-afterburn-checkin.service dependencies: After=coreos-kargs-reboot.service (https://github.com/openshift/os/commit/e0363da044c598d38b1f62a4dfbe0e43ccdaf0e3, refer to #BZ1980679), but from journalctl can not find coreos-kargs-reboot.service logs.

Reason from @cgwalters : the journal isn't persisted here is because we reboot before switching root, which is when systemd saves it

travier commented 3 years ago

Generally agree that this would be useful, but most probably only for debugging some very specific cases.

We also have to be careful on loading "untrusted" content from /boot into the journal as well as stored unencrypted logs in /boot as that may not be OK for some use cases where users might expect everything to be stored encrypted on the disk at all times (excepted /boot content, maybe also in PXE boot use cases?).

This also has implications for measured boot in FCOS (which does not exist/work yet but could make it harder).

jlebon commented 3 years ago

We discussed this in today's community meeting. Some things raised:

Timothée's points above
Doubt about usefulness of those logs vs implementation complexity
- If something breaks on first boot, then we wouldn't reboot anyway. It seems unlikely we'd have an error where something in the first boot causes breaks in the second boot, since very few things don't rerun.
Log gathering for troubleshooting is still possible via serial console

cgwalters commented 3 years ago

Maybe instead of the whole journal, we just write a tiny bit of information into /boot such as the fact that we did an early reboot, and then have code that runs in the real root that logs journal messages from that? journalctl --list-boots would still lie, but at least we'd be able to see something there.

HuijingHei commented 3 years ago

Agree with @cgwalters , maybe can output the simple logs include coreos-kargs-reboot.service, then we can know that it actually triggers the reboot, does this make sense?

bgilbert commented 3 years ago

@HuijingHei If the goal is just to demonstrate that the system rebooted after adding kernel arguments, you don't need logs for that, since you'll be able to see the new arguments in /proc/cmdline. We already have such a test here.

HuijingHei commented 3 years ago

@HuijingHei If the goal is just to demonstrate that the system rebooted after adding kernel arguments, you don't need logs for that, since you'll be able to see the new arguments in /proc/cmdline. We already have such a test here.

If in this case, the logs about early reboot are less important, just check the final goal

coreos / fedora-coreos-tracker

design journal persistence for early provisioning #955