Closed tserong closed 5 years ago
As far as the reboot after package removal on failure it is possible to handle this in %post of the spec file or can can be done in code.
This means the migration package is removed, but the grub config is not updated correctly
I guess due to the new raise conditions the system to be migrated was umounted prior to the grub_setup service. Could you provide the full log file, I'd like to check if this was the case. Thanks
I've just run it again with a deliberately broken custom config file. The package removal seems to succeed, at least partially, so it must have been mounted when it starts to uninstall the migration RPMs. It looks like it unmounts after the grub_setup service, but only by a second or so. Is there any way the two services could be racing?
Here's the relevant timestamps for my latest run:
[INFO ]: 04:36:45 | Running grub setup service
[INFO ]: 04:36:47 | Uninstalling migration:
(...)
[ERROR ]: 04:36:47 | Update grub failed with chroot: stderr: awk: fatal: cannot open file `/proc/self/mountinfo' for reading (No such file or directory)
/usr/sbin/grub2-probe: error: cannot find a device for / (is /dev mounted?).
, stdout: (no output on stdout)
(...)
[INFO ]: 04:36:48 | Umounting system
[INFO ]: 04:36:48 | Umounting /system-root/: command(output='', error='', returncode=0)
Here's the full log file: distro_migration.log
Scratch that earlier comment about races; given all the After=
options, such things shouldn't be possible.
Ah! It's because the prepare service doesn't run if the mount service raises an exception! (The prepare service is responsible for mounting /dev
, /proc
and other stuff inside /system-root
, which in necessary in order for grub2-mkconfig
to operate)
Ah! It's because the prepare service doesn't run if the mount service raises an exception! (The prepare service is responsible for mounting
/dev
,/proc
and other stuff inside/system-root
, which in necessary in order forgrub2-mkconfig
to operate)
yep that's it, the prepare service file has: Requires=suse-migration-mount-system.service suse-migration-setup-host-network.service network-online.target
if one of those fails it does not run.
Hmm, this is a problem. grub2-mkconfig requires mount-system to succeed because it has to write back the changes on the target machine. Mount and validation should be decoupled. The mount service should only fail if there is a real issue on mounts to keep the process flow intact
systemd/suse-migration-grub-setup.service should mho also have a Requires: suse-migration-mount-system.service. It doesn't make sense to call this service if mounting has failed
I suggest to add your validation code into the units/post_mount_system.py code. This means we let it first mount everything, then we validate the config, if it fails we raise and go down the cleanup route.
I suggest to add your validation code into the units/post_mount_system.py code. This means we let it first mount everything, then we validate the config, if it fails we raise and go down the cleanup route.
I think I have to do it later than that, as /proc
and others still aren't mounted yet inside the system-root chroot, so if I raise in post_mount_system, I'll probably still have the same problem when hitting the grub step. Maybe I should put the validation in migrate.py, immediately before it tries to run zypper?
I think I have to do it later than that, as /proc and others still aren't mounted yet inside the system-root chroot,
yes you are root. After the prepare service it's all in place. If the validation is part of prepare (units/prepare.py). it's mount -> post-mount -> prepare + validate
There is one problem though. the post-mount service reads the config file before validate in this case. Maybe it's worth if we move the mounting of the kernel file systems proc dev sys into the mount service instead of the prepare service. That should keep the process stable
Thoughts ?
Maybe it's worth if we move the mounting of the kernel file systems proc dev sys into the mount service instead of the prepare service. That should keep the process stable
Yeah, I think that makes sense. I'll see what I can do tomorrow.
Alright, this now seems to be behaving correctly.
Note that while doing this work I discovered SLES12-SP4-SLES15-Migration in Devel:PubCloud currently includes etc/migration-config.yml
specifying use_zypper_migration_plugin: true
, which fails schema validation, thus killing the migration. This needs to be updated to use_zypper_migration: true
(or just removed, as that's the default anyway).
Note that while doing this work I discovered SLES12-SP4-SLES15-Migration in Devel:PubCloud currently includes
etc/migration-config.yml
specifyinguse_zypper_migration_plugin: true
, which fails schema validation, thus killing the migration. This needs to be updated touse_zypper_migration: true
(or just removed, as that's the default anyway).
Thanks good catch. I deleted the file. All those values are now either defaults or gets auto detected
If the custom config file is empty or contains only comment lines, it will now be ignored (rather than raising an exception and killing the migration). If the file is corrupt somehow, or violates schema validation, an error will be logged and the migration will abort.
Validation is also applied to the default config file (the one inside the migration image). This makes automated testing easy, because we'll effectively do automatic validation on any of the test config files, whether they're for default config or custom config.
There is one problem with the current implementation: if either the default config file or the custom config file is broken in some way, and an exception is raised, the subsequent grub setup service fails, and I have no idea why. The log file says:
This means the migration package is removed, but the grub config is not updated correctly, so the next reboot tries to load the migration again, and gets stuck in grub complaining about a missing ISO. Any assistance would be much appreciated...