SUSE / suse-migration-services

GNU General Public License v3.0
7 stars 11 forks source link

Grub timeout for activation configurable #109

Closed Martin-Weiss closed 5 years ago

Martin-Weiss commented 5 years ago

When using the suse-migration-services and there is a problem with the boot - i.e. we have seen endless loops due to failures in mounting the installed-system - it is very hard to switch back to the default OS boot.

IMO - it would be nice if we could do one or two things (if possible ;-)):

  1. increase the 1 sec timeout to maybe 10 seconds or make it configurable and
  2. maybe use grub2-once instead of changing the default
schaefi commented 5 years ago

In a future release I would like to get away from the bootloader completely and use kexec. See #108

Until then we can talk about a configurable timeout. But actually this is something you could do prior reboot directly in the grub.cfg after the activation package has created the entry.

rjschwei commented 5 years ago

In a future release I would like to get away from the bootloader completely and use kexec. See #108

Until then we can talk about a configurable timeout. But actually this is something you could do prior reboot directly in the grub.cfg after the activation package has created the entry.

I agree with @schaefi if a longer timeout is needed the user can configure that timeout in grub.cfg. Trying to put that into the migration configuration is going to introduce a lot of complications and we are going to end up with "do this before that" instructions for the user anyway.

Martin-Weiss commented 5 years ago

Maybe we can just change the default of 1s to 10s? Or why is it on 1s?

rjschwei commented 5 years ago

Maybe we can just change the default of 1s to 10s? Or why is it on 1s?

No, it is 1 s because things are automated and we have no reason to access/fiddle with the grub menu when the migration system boots. The need to interact with the grub menu is an exceptional case and we should not optimize for the exceptional case.

Martin-Weiss commented 5 years ago

Maybe we can just change the default of 1s to 10s? Or why is it on 1s?

No, it is 1 s because things are automated and we have no reason to access/fiddle with the grub menu when the migration system boots. The need to interact with the grub menu is an exceptional case and we should not optimize for the exceptional case.

And what if the boot to the migration system fails - i.e. due to the mount of the installed system not working? How do you come out of the endless loop if you can not change to an other boot loader menu entry within that second? (this was the issue I had and re-active changing from 1s to 10s was a big challenge)

schaefi commented 5 years ago

How do you come out of the endless loop

This is also one point that gets solved with #108

Other than that and as long as you are in the development phase, I suggest to set

debug: true

in your /etc/sle-migration-service.yml. This will prevent any loop on error because you will stay in the live migration system on error.

If you leave development mode I expect no such endless loop scenarios. You should have in mind this way of migration is always done for a specific environment. Your storage cluster or our public cloud instances or something else which has a clear structure. For the situation of customers that have a setup of whatever system in whatever weird boot environment and other crazy inventions this concept is probably not preferable. In this situation the non automated, fully interactive way of sitting in front of a machine and trying to get out of the update hell is the right way to go

Do you understand what I mean ?

As Robert said we are aiming for an automated upgrade process. This requires a little bit of a standard environment. Not everything can be allowed. The price for automation

rjschwei commented 5 years ago

That we failed with mounting is a bug, I think we need to be more selective. I re-opened #110 and re-titled it and posted #115 as a proposal to address the issue. We should really only mount file systems that we care about during migration, auxiliary things should be dropped.

And as @schaefi stated there is is always the debug mode.

Martin-Weiss commented 5 years ago

I understand all of this - but honestly I believe the 1 sec timeout does not bring much value and includes a risk. Adding +9 seconds to the upgrade process is not something I see as problematic but would help a lot in case there is an issue. If we do not want to change the default from 1 to 10 and do not want to use grub2-once - it would be nice if we could at least add the info to the documentation how to adjust this timeout for the first few servers where a customer does this upgrade the first time - just in case ;-).

So something like this: The default timeout for the grub2 upgrade boot loader entry is 1 sec. For the first few servers where you are going to use this method you can adjust this timeout after installation of the activation rpm by adjusting /etc/gru.../99.... and executing grub2-mkconfig -o /boot/grub2/grub.cfg.

schaefi commented 5 years ago

With the changes introduced in #125 we will go without a bootloader which simplifies the startup and also avoids any loop condition. Therefore I'm closing this one.

Thanks