SUSE / suse-migration-services

GNU General Public License v3.0
7 stars 11 forks source link

Add schema validation for migration config file #142

Closed tserong closed 5 years ago

tserong commented 5 years ago

If the custom config file is empty or contains only comment lines, it will now be ignored (rather than raising an exception and killing the migration). If the file is corrupt somehow, or violates schema validation, an error will be logged and the migration will abort.

Validation is also applied to the default config file (the one inside the migration image). This makes automated testing easy, because we'll effectively do automatic validation on any of the test config files, whether they're for default config or custom config.

There is one problem with the current implementation: if either the default config file or the custom config file is broken in some way, and an exception is raised, the subsequent grub setup service fails, and I have no idea why. The log file says:

[INFO    ]: 06:09:31 | Running grub setup service
[INFO    ]: 06:09:32 | Uninstalling migration:
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 2 packages are going to be REMOVED:
  SLES15-SES-Migration suse-migration-sle15-activation

2 packages to remove.
After the operation, 245.1 MiB will be freed.
Continue? [y/n/...? shows all options] (y): y
(1/2) Removing suse-migration-sle15-activation-1.2.0-20.11.noarch [.....done]
Additional rpm output:
awk: fatal: cannot open file `/proc/self/mountinfo' for reading (No such file or directory)
/usr/sbin/grub2-probe: error: cannot find a device for / (is /dev mounted?).
warning: %postun(suse-migration-sle15-activation-1.2.0-20.11.noarch) scriptlet failed, exit status 1

(2/2) Removing SLES15-SES-Migration-1.15.2.tserong.8-33.x86_64 [.....done]

[ERROR   ]: 06:09:32 | Update grub failed with chroot: stderr: awk: fatal: cannot open file `/proc/self/mountinfo' for reading (No such file or directory)
/usr/sbin/grub2-probe: error: cannot find a device for / (is /dev mounted?).
, stdout: (no output on stdout)

This means the migration package is removed, but the grub config is not updated correctly, so the next reboot tries to load the migration again, and gets stuck in grub complaining about a missing ISO. Any assistance would be much appreciated...

rjschwei commented 5 years ago

As far as the reboot after package removal on failure it is possible to handle this in %post of the spec file or can can be done in code.

schaefi commented 5 years ago

This means the migration package is removed, but the grub config is not updated correctly

I guess due to the new raise conditions the system to be migrated was umounted prior to the grub_setup service. Could you provide the full log file, I'd like to check if this was the case. Thanks

tserong commented 5 years ago

I've just run it again with a deliberately broken custom config file. The package removal seems to succeed, at least partially, so it must have been mounted when it starts to uninstall the migration RPMs. It looks like it unmounts after the grub_setup service, but only by a second or so. Is there any way the two services could be racing?

Here's the relevant timestamps for my latest run:

[INFO    ]: 04:36:45 | Running grub setup service
[INFO    ]: 04:36:47 | Uninstalling migration:
(...)
[ERROR   ]: 04:36:47 | Update grub failed with chroot: stderr: awk: fatal: cannot open file `/proc/self/mountinfo' for reading (No such file or directory)
/usr/sbin/grub2-probe: error: cannot find a device for / (is /dev mounted?).
, stdout: (no output on stdout)
(...)
[INFO    ]: 04:36:48 | Umounting system
[INFO    ]: 04:36:48 | Umounting /system-root/: command(output='', error='', returncode=0)

Here's the full log file: distro_migration.log

tserong commented 5 years ago

Scratch that earlier comment about races; given all the After= options, such things shouldn't be possible.

tserong commented 5 years ago

Ah! It's because the prepare service doesn't run if the mount service raises an exception! (The prepare service is responsible for mounting /dev, /proc and other stuff inside /system-root, which in necessary in order for grub2-mkconfig to operate)

schaefi commented 5 years ago

Ah! It's because the prepare service doesn't run if the mount service raises an exception! (The prepare service is responsible for mounting /dev, /proc and other stuff inside /system-root, which in necessary in order for grub2-mkconfig to operate)

yep that's it, the prepare service file has: Requires=suse-migration-mount-system.service suse-migration-setup-host-network.service network-online.target

if one of those fails it does not run.

Hmm, this is a problem. grub2-mkconfig requires mount-system to succeed because it has to write back the changes on the target machine. Mount and validation should be decoupled. The mount service should only fail if there is a real issue on mounts to keep the process flow intact

systemd/suse-migration-grub-setup.service should mho also have a Requires: suse-migration-mount-system.service. It doesn't make sense to call this service if mounting has failed

I suggest to add your validation code into the units/post_mount_system.py code. This means we let it first mount everything, then we validate the config, if it fails we raise and go down the cleanup route.

tserong commented 5 years ago

I suggest to add your validation code into the units/post_mount_system.py code. This means we let it first mount everything, then we validate the config, if it fails we raise and go down the cleanup route.

I think I have to do it later than that, as /proc and others still aren't mounted yet inside the system-root chroot, so if I raise in post_mount_system, I'll probably still have the same problem when hitting the grub step. Maybe I should put the validation in migrate.py, immediately before it tries to run zypper?

schaefi commented 5 years ago

I think I have to do it later than that, as /proc and others still aren't mounted yet inside the system-root chroot,

yes you are root. After the prepare service it's all in place. If the validation is part of prepare (units/prepare.py). it's mount -> post-mount -> prepare + validate

There is one problem though. the post-mount service reads the config file before validate in this case. Maybe it's worth if we move the mounting of the kernel file systems proc dev sys into the mount service instead of the prepare service. That should keep the process stable

Thoughts ?

tserong commented 5 years ago

Maybe it's worth if we move the mounting of the kernel file systems proc dev sys into the mount service instead of the prepare service. That should keep the process stable

Yeah, I think that makes sense. I'll see what I can do tomorrow.

tserong commented 5 years ago

Alright, this now seems to be behaving correctly.

tserong commented 5 years ago

Note that while doing this work I discovered SLES12-SP4-SLES15-Migration in Devel:PubCloud currently includes etc/migration-config.yml specifying use_zypper_migration_plugin: true, which fails schema validation, thus killing the migration. This needs to be updated to use_zypper_migration: true (or just removed, as that's the default anyway).

schaefi commented 5 years ago

Note that while doing this work I discovered SLES12-SP4-SLES15-Migration in Devel:PubCloud currently includes etc/migration-config.yml specifying use_zypper_migration_plugin: true, which fails schema validation, thus killing the migration. This needs to be updated to use_zypper_migration: true (or just removed, as that's the default anyway).

Thanks good catch. I deleted the file. All those values are now either defaults or gets auto detected