SUSE / suse-migration-services

GNU General Public License v3.0
7 stars 10 forks source link

Use file based syscall for kexec operation #175

Closed schaefi closed 4 years ago

schaefi commented 4 years ago

To avoid permission problems on load of the kernel use the new KEXEC_FILE_LOAD syscall instead of KEXEC_LOAD This Fixes #174

schaefi commented 4 years ago

@tserong please double check if the call is correct. I avoided the auto selection since you mentioned it has issues. Thanks

tserong commented 4 years ago

Will test & verify

tserong commented 4 years ago

Tested this patch with SES migration image, on two VMs running SLE 12 SP3, one configured to use BIOS, one configured to use UEFI. run_migration successfully started the migration in both cases.

There's a problem with rebooting from the migration on EFI systems, but it's unrelated to this patch. The kernel-load service fails with:

Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: --- Logging error ---
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: Traceback (most recent call last):
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib/python3.6/site-packages/suse_migration_services/units/kernel_load.py", line 64, in main
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     '--command-line', _get_cmdline(os.path.basename(target_kernel))
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib/python3.6/site-packages/suse_migration_services/units/kernel_load.py", line 96, in _get_cmdline
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     cmd_line = re.findall(pattern, grub_content)[0]
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: IndexError: list index out of range
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: During handling of the above exception, another exception occurred:
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: Traceback (most recent call last):
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib64/python3.6/logging/__init__.py", line 994, in emit
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     msg = self.format(record)
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib64/python3.6/logging/__init__.py", line 840, in format
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     return fmt.format(record)
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib64/python3.6/logging/__init__.py", line 577, in format
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     record.message = record.getMessage()
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib64/python3.6/logging/__init__.py", line 338, in getMessage
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     msg = msg % self.args
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: TypeError: not all arguments converted during string formatting
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: Call stack:
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/bin/suse-migration-kernel-load", line 11, in <module>
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     load_entry_point('suse-migration-services==1.1.2', 'console_scripts', 'suse-migration-kernel-load')()
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:   File "/usr/lib/python3.6/site-packages/suse_migration_services/units/kernel_load.py", line 68, in main
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]:     log.error('Kernel load service raised exception: {0}', format(issue))
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: Message: 'Kernel load service raised exception: {0}'
Jul 08 10:28:18 localhost suse-migration-kernel-load[2069]: Arguments: ('list index out of range',)

Looking at line 96 of kernel_load.py, we have:

    pattern = r'(?<=linux)[ \t]+{0}([{1}|boot/{1}].*)'.format(
        os.sep, kernel_name
    )

But, this doesn't match the grub config (/system-root/boot/grub2/grub.cfg), which uses "linuxefi /boot/vmlinuz[...]" on EFI systems, instead of "linux /boot/vmlinuz[...]".

The migration system does still ultimately reboot of course, because of the fallback to a hard reboot.

tserong commented 4 years ago

Related to the "linux" vs. "linuxefi" thing, should we also try to fix the grub config generated by grub.d/99_migration, so it'll set "linuxefi" and "initrdefi" on EFI systems?

schaefi commented 4 years ago

There's a problem with rebooting from the migration on EFI systems

great catch, thanks a ton for testing. I added a patch to fix that part. Thanks

schaefi commented 4 years ago

Related to the "linux" vs. "linuxefi" thing, should we also try to fix the grub config generated by grub.d/99_migration, so it'll set "linuxefi" and "initrdefi" on EFI systems?

yes let apply the same we did on kiwi, just a sec.

schaefi commented 4 years ago

@tserong ok I applied the suggested fixes/changes. Could you give it another test ?

Thanks much

rjschwei commented 4 years ago

Just double checking, we know that --kexec-file-syscall has been supported since original release of SLES 15 GA, right? If not we'd need a Requires: kernel >= somewhere.

tserong commented 4 years ago

@tserong ok I applied the suggested fixes/changes. Could you give it another test ?

Working on it.

Just double checking, we know that --kexec-file-syscall has been supported since original release of SLES 15 GA, right?

I haven't tried it on SLES 15 GA, but is is there on SLES 12 SP3, which is even older.

tserong commented 4 years ago

OK, with the latest version of this patch, kexec and grub reboot both still work fine on BIOS.

For EFI, kexec reboot works (and the kernel_load unit is fixed), but I still have a problem with grub. Now it fails with:

error: can't find command `loopback'.
error: no server is specified.
error: you need to load the kernel first.

Some further digging shows that (for some reason) I need to add two lines as mentioned in https://bugzilla.suse.com/show_bug.cgi?id=1173532#c3:

    btrfs-mount-subvol / /boot/grub2/x86_64-efi /@/boot/grub2/x86_64-efi
    insmod loopback

With these lines added, EFI boot works, but only if secure boot is disabled. If secure boot is enabled, nothing I can think of ever seems to enable the loopback command.

I have no idea why the above isn't necessary in BIOS mode :-/

tserong commented 4 years ago

Re-tested BIOS with btrfs root (previous test was with ext4 root), still works fine. Still no idea why the the btrfs-mount-subvol and insmod loopback lines only seem necessary for EFI, not BIOS.

Re-tested EFI with ext4 root. With secure boot disabled, it work fine with this patch (no need for insmod loopback). With secure boot enabled, the loopback command seems to be a no-op, so again we're back to "error: no server is specified. error: you need to load the kernel first." I think for secure boot systems, grub loopback just isn't an option that works, and people will have to use the run_migration script in this case.

schaefi commented 4 years ago

If secure boot is enabled, nothing I can think of ever seems to enable the loopback command.

This is a problem yes. If you boot with secure boot enabled the loading of modules is only possible if a signed grub loader(shim in suse) ran before. Is the system you are testing this correctly setup for EFI secure boot ? Meaning can the system normally boot in secure boot mode ? I assume not. For the distribution migration we just add another menuentry in the existing grub setup. If that setup and the system itself is able to boot in secure boot mode it should also be possible to load other modules. But I might be wrong here and the way loopback works is really not possible in secure boot mode. If you think about what secure means in this regard the behavior would even make sense. But yes this means the loopback boot concept would then not work for secure boot and customers can only start the migration via kexec (run-migration). If we come to that point we should at least document it

btrfs-mount-subvol / /boot/grub2/x86_64-efi /@/boot/grub2/x86_64-efi

And we want to add that only if the system is btrfs based and boot/grub2/x86_64-efi is a subvolume. I'll add a commit to address this one

tserong commented 4 years ago

The system can normally boot in secure boot mode (mokutil --sb-state prints "SecureBoot enabled" when the system is up and running), so it must be something about loopback not being possible in this case, as you suggest.

And we want to add that only if the system is btrfs based and boot/grub2/x86_64-efi is a subvolume. I'll add a commit to address this one

Re-tested on (non-secure-boot) EFI btrfs and ext4 systems, that commit works fine (the extra lines are added in the btrfs case, but not in the ext4 case, and it reboots into the migration system correctly in both cases).

schaefi commented 4 years ago

ok thanks Tim for all your testing. I think the problem with loopback in secure boot should be added as an issue. I'll do that