fedora-iot / greenboot

Generic Health Checking Framework for systemd
GNU Lesser General Public License v2.1
95 stars 29 forks source link

Get information that system rolled back #118

Open pmtk opened 11 months ago

pmtk commented 11 months ago

Hello

In MicroShift we're looking for a sure way to know that system rolled back so we can perform certain actions.

We have some ideas so far:

  1. In red script, if boot_counter == 0 then create some kind of file on disk to persist that information and read it on next boot. This assumes that:
    • Existence of boot_counter means we're in the middle of "new deployment hasn't been determined yet to be okay, so greenboot might reboot the system" - especially when red script runs (as it will be followed by reboot)
    • boot_counter == 0 means it's a last attempt, when system reboots, grub will see that value and select second boot entry (rollback)
  2. Inspect journalctl --boot 0 -u greenboot-rpm-ostree-grub2-check-fallback for existence of FALLBACK BOOT DETECTED! Default rpm-ostree deployment has been rolled back message
    • Slightly worried about timing, we would probably want After=greenboot-rpm-ostree-grub2-check-fallback, but we'd have to check if non-ostree has any impacts, or if RemainAfterExit=yes would affect that as well
  3. Extend greenboot so greenboot-rpm-ostree-grub2-check-fallback to create a file like /run/rolled-back, this file would be removed by greenboot-grub2-set-counter or be cleaned automatically by reboot (if new deployment wasn't staged, but machine was simply rebooted)

Do you have any other ideas how could we make it a robust mechanism? Would you happen to know about any other source of this information like grub or (rpm-)ostree?

say-paul commented 11 months ago

Does the for a sure way to know that system rolled back needs to be known on the boot(T) when greenboot will attempt rollback or on reboot(T+1) after the rollback happened. It will determine which approach needs to be taken. Though I agree there should be a cleaner way to determine it.

pmtk commented 11 months ago

The latter: on reboot(T+1) after the rollback happened

say-paul commented 11 months ago

Then the possible workaround for now can be to check if the the boot_counter is unset/boot_success=1 combined with journalctl --boot 1 -u greenboot-rpm-ostree-grub2-check-fallback has the rollback message.