Open dhensel-rh opened 4 months ago
I think the microshift scripts are part of microshift, the bug would need to be reported there as those are the scripts that are taking the time
I opened a ticket wit hMicroShift and discussed this with Gregory Giguashvili previously. He felt this was a Greenboot feature enhancement.
'Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.`
Can you provide a link to the microshift ticket, the details provided to date don't provide that context. Also links to where the mentioned scripts can be found.
Oh yes. i was not sure if you could view the ticket. Apologies
Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure. To quote the full comment:
` Douglas Hensel , this is by design of the current greenboot implementation.
Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.
May I suggest closing this JIRA issue and opening an upstream feature request at https://github.com/fedora-iot/greenboot/issues? `
It also looks like microshift is randomly copying/forking things, it would be useful if those could be actually sent back upstream rather than forking as it also makes it hard to know if the bug is upstream or in your fork :)
https://github.com/openshift/microshift/commit/556a91a30dabffde1fe248ba01c85940bbdac928
@nullr0ute Do you need me to do anything to start an RFE ?
@nullr0ute , let me comment on the issues you're raising.
Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure.
The comment you're referring to in the JIRA ticket was our best guess on the current greenboot implementation. From the experience we have, I cannot think of a use-case when I would want to continue running greenboot scripts following a failure. It just delays the inevitable. Nevertheless, if we want to keep the current default behavior, we should add an option to change it. Can we consider this issue as an RFE for the requested functionality?
microshift is randomly copying/forking things: https://github.com/openshift/microshift/commit/556a91a30dabffde1fe248ba01c85940bbdac928
Note that the commit you're referring to is not a copy / fork of any greenboot upstream functionality. It's a bug fix in MicroShift internal scripts. These scripts are implementing MicroShift-specific health-check functionality and they cannot be used upstream in the generic greenboot code.
This issue is observed in MicroShift project. When multiple start scripts are on the system, and an error is encountered in the first script, Greenboot does not immediately fail. Using default settings, Greenboot checks all of them before rebooting (in 4.16, this is 3 scripts 40_microshift_running_check.sh, 41_microshift_running_check_multus.sh, 50_microshift_running_check_olm.sh). This can add delay to the system rolling back to a known good state. This might be intended design, but it can add time before a rollback can occur.
On the first boot, each script can take up to 5 minutes to check. That is ~15 minutes. On the second boot, this increases to 10 minutes per script. That is ~30 minutes. On the third boot, each script takes up to 15 minutes.
Can this be optimized in some way ?