fedora-iot / greenboot

Generic Health Checking Framework for systemd
GNU Lesser General Public License v2.1
101 stars 29 forks source link

Greenboot should add more message in log to tell user what happened #104

Closed yih-redhat closed 1 year ago

yih-redhat commented 1 year ago

Currently we can see greenboot log like this:

[admin@vm-1 ~]$ journalctl -b -4 -u greenboot.service Jun 12 08:02:47 localhost systemd[1]: Starting greenboot Health Checks Runner... Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot > GreenbootConfig { max_reboot: 3 } Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot > running required check /usr/lib/greenboot/check/required.d/01_repository_d> Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot > running required check /usr/lib/greenboot/check/required.d/02_watchdog.sh Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot > running required check /etc/greenboot/check/required.d/10_failing_check.sh Jun 12 08:02:47 localhost greenboot[909]: ERROR greenboot > required script /etc/greenboot/check/required.d/10_failing_check.sh failed! Jun 12 08:02:47 localhost greenboot[909]: ERROR greenboot > reason: Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot > running wanted check /usr/lib/greenboot/check/wanted.d/01_update_platforms> Jun 12 08:02:47 localhost greenboot[909]: WARN greenboot > wanted script /usr/lib/greenboot/check/wanted.d/01_update_platforms_check.> Jun 12 08:02:47 localhost greenboot[909]: WARN greenboot > reason: grep: /etc/ostree/remotes.d/*: No such file or directory Jun 12 08:02:47 localhost greenboot[909]: ERROR greenboot > Greenboot health-check failed! Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot::handler > boot_counter initialized Jun 12 08:02:47 localhost greenboot[909]: INFO greenboot::handler > restarting system Jun 12 08:02:47 localhost greenboot[909]: Error: health-check failed! Jun 12 08:02:47 localhost systemd[1]: greenboot.service: Main process exited, code=exited, status=1/FAILURE Jun 12 08:02:47 localhost systemd[1]: greenboot.service: Failed with result 'exit-code'. Jun 12 08:02:47 localhost systemd[1]: Stopped greenboot Health Checks Runner.

Suggest to add more message to help customer to understand the boot status:

  1. For general boot result, add message like: a. "health check passed and boot is green", b. "health check failed but no previous commit found, will boot with current commit", c. "health check failed, a previous commit was found, will try to reboot and rollback"
  2. In case c, there will be multiple reboots, and rollback will happen if health check failed 3 times, so in this case, we can add more info to tell user where we are, like: greenboot_healtheck fails, boot_counter=3, greenboot restart=1 greenboot_healtheck fails, boot_counter=2, greenboot restart = 2 greenboot_healtheck fails, boot_counter=1, greenboot restart = 3 greenboot_healtheck fails, boot_counter= 0, greenboot restart = 4 greenboot_healtheck fails, boot_counter=-1, ostree rollback to previous commit
say-paul commented 1 year ago

@yih-redhat fixed in the above PR, you can retest it.

yih-redhat commented 1 year ago

User can find useful information in log, details can be found in https://issues.redhat.com/browse/VIRTCLOUD-636