Closed stephanebill closed 4 years ago
Hmm, if it's a kernel issue there is little we can do
Can you try to boot with the additional options:
rd.debug rd.kiwi.debug
after some time this should lead you to an emergency shell inside of the initrd and you can check if the devices are really not present. Maybe you also need rd.break=initqueue
I see you are also using the iso-scan feature. Maybe it's related to the grub loopback support. I've tested the integration test from here: https://build.opensuse.org/package/show/Virtualization:Appliances:Images:Testing_x86:suse/test-image-live and this works with kernel 5.8.14-1. But it's not using iso-scan as in your case
So can you test if directly booting isofile="/iso/tumbleweed.x86_64-1.99.1.iso" via kvm works ?
qemu-kvm -cdrom /iso/tumbleweed.x86_64-1.99.1.iso
Thanks
I did several tests with/without iso-scan/isofile, with my test ISO and the integration test ISO you linked, I could not get either to boot through grub, /dev/disk/by-label is not there so the root is not found. I could not get them to work with qemu-system-x86_64 on my Leap 15.2 desktop. Both images work fine when dd'ed to a flash drive. I was hoping something was missing in the initrd to explain the disappearance of by-label. I will keep experimenting.
Hmm, sounds more like a virtualization issue. Can you try:
zypper in qemu-kvm
qemu-kvm -m 4096 -drive file=/iso/tumbleweed.x86_64-1.99.1.iso,if=virtio,format=raw
Should map the iso image to a virtio device in the guest, usually: /dev/vda1
I can confirm this issue the same one reported in boo#1177900. Up to my tests this raises on booting the ISO via Grub2 using the iso-scan
dracut scripts. For some reason in that context the devices under /dev//disks/by-label
are not populated. I verified that iso-scan is porperly executed and hence the loop device for the ISO is properly prepared. However, for some reason, udev (or anything related) does not detect and raise the new device. I could not find obvious differences between persistent storage udev rules between leap 15.2 and Tumbleweed. I am still debugging udev.
Thanks for looking into this. I hope this is a plain kernel/grub problem and not something we need to address in kiwi
JFYI and from boo#1177900
The problem is that after running udev services at boot, udev does not react on registering the ISO loop device according to the persistent storage rules, it does not populate the
/dev/disk/by-label
. I tested the exact same image [1] with different versions of the Kernel and using any pre 5.8.0 kernel solves the issue, since 5.8.0 it fails.At boot dracut initqueue falls in a never ending loop waiting for a device that has not been detected. After the timeout it just drops in to a shell. From that given shell I can verify that the losetup over the ISO was executed as expected and I also verified that calling
udevadm trigger --type=devices --action=add
populates the expected/dev/disk/by-label/CDROM
device. This udeadm call is part of thesystemd-udev-trigger.service
which was also properly executed during the boot. I am mostly clue less about what is actually causing this behaviour change, I also could not see any relevant difference on udev package and the related services and udev rules.Given that what raises the issue is a kernel update I adding a require info to the kernel maintainers.
Reproducer:
- Download a TW live image
- 'Install' it on a Fat32 usb stick with
live-grub-stick
utility $ sudo live-grub-stick/dev/sdXY - Boot the image on KVM $ sudo -E qemu-kvm -m 2048 -hda /dev/sdX
Currently we don't know why the device in question is not being created. But maybe there is an opportunity to recall udev or we find the potential race condition.
We'll do some tests in that area and come back with feedback
I just validated that the issue still persists on kernel v5.9.1 (current default kernel on TW).
@schaefi I'll do some more investigation but this is tough to workaround at dracut module level without touching the isoscan scripts code. In fact our kiwi-live-root.sh
script is never called in this case because it is triggered by this udev rule:
The boot process get stuck waiting for the device of 15th line wait_for_dev -n "${root#live:}"
. Isoscan scripts create the device that should trigger this udev rule as a loopback device. Hence there is no chance for us to call udevadm trigger
in
dracut/modules.d/90kiwi-live/kiwi-live-root.sh
as it never gets called.
Thanks much for hunting this down. I see and the way the kiwi live-root is waiting for the device is correct and also wanted. I guess a fix should then rather go to /usr/lib/dracut/modules.d/90dmsquash-live/iso-scan.sh
I'd say as a consequence of the report here we can open a bugzilla ticket against dracut-maintainers@suse.com
Thoughts ?
I'd say as a consequence of the report here we can open a bugzilla ticket against dracut-maintainers@suse.com
Today I had planned to try reproduce this issue on dmsquash on Fedora Rawhide and verify how udev brings up loop devices on a booted system. But yes, if my investigations are correct, most likely this needs to be solved at dracut level, then a ticket there would make sense.
I did few more investigations on this issue and now I am quite convinced the problem is within some behavior change in the kernel. I noticed that during the initqueue stage at boot losetup -f <isofile>
does not trigger any uevent (nothing gets detected by udevadm monitor
), so udev does not have any event to react to while setting the loop device. However if the uevent is manually forced by echo "change" >> /sys/block/loop0/uevent
or udevadm trigger --type=devices --action=add
then the device is properly managed by udev and the /dev/disk/*
links are created as expected.
If I boot the same ISO as qemu-kvm -m 2049 -cdrom <isofile>
I can see how losetup -f <isofile>
triggers the expected uevent on the booted system, thus it looks like the issue is only happening at early boot stages. I could not figure out why.
Given the fact that kernel 5.7 works fine and since 5.8 it doesn't I checked the kernel release notes for 5.8 and saw there are some changes applied for loop devices in this commit, however I don't have enough knowledge on the kernel to trace if is actually related or not.
The reported issue could be easily worked around by retriggering device events after the loop device setup on /usr/lib/dracut/modules.d/90dmsquash-live/iso-scan.sh
. Just did a quick test and it worked smooth.
This hacky workaround can be applied at image description level by including the following into the config.sh
script:
#======================================
# Patch isoscan script
# See OSInside/kiwi#1586 at github.com
#--------------------------------------
isoscan_script="/usr/lib/dracut/modules.d/90dmsquash-live/iso-scan.sh"
if [ -f "${isoscan_script}" ]; then
new_line="udevadm trigger --type=devices --action=add"
sed -i "/losetup -f/a ${new_line}" "${isoscan_script}"
fi
fantastic observation and feedback :+1: With this information I think a bug directly assigned to the kernel team can be made. A workaround from our side is provided as well but that will only stay here for people who need immediately a "solution". Let's keep this open for reference. I will delete the P2 label from our side as there is nothing more we can do.
Thanks much David
Thank you for looking into this, however my preliminary tests with the workaround still does not fully resolve the problem, I get:
dracut: FATAL: Failed to mount live ISO device
dracut: Refusing to continue
dracut-initqueue[557]: mount: /run/overlay/squashfs_container: special device /LiveOS/squashfs.img does not exist.
dracut: FATAL: Failed to mount live ISO squashfs container
dracut: Refusing to continue
dracut-initqueue[560]: Failed to execute operation: Connection reset by peer
dracut-initqueue[565]: mount: /run/overlay/rootfsbase: special device /LiveOS/rootfs.img does not exist.
dracut: FATAL: Failed to mount live ISO root filesystem
dracut: Refusing to continue
dracut-initqueue[568]: Failed to connect to bus: No such file or directory
dracut-initqueue[568]: It is possible to perform action directly, see discussions of --force --force in man:systemctl(1)
reboot: System halted
I am not able to get a shell to look further even with rd.debug
and rd.kiwi.debug
Will continue to test.
@stephanebill I can't tell now why is this now working. In order to get an emergency shell at boot I usually make use of rd.debug
and rd.shell
. However in some rare cases I also could not manage to fallback into an emergency shell.
Anyway, good news are that this issue seams to be solved starting from kernel v.5.10-rc4. Kudos to @vogtinator who pointed me to the right kernel update and uevents related commit here. Also, according to the kernel maintainers in the bugzilla ticket this has been pushed to current stable kernel as it is a simple patch.
I successfully booted with kernel 5.10-rc4. Thanks again for all the work @davidcassany
Thanks much for the feedback. Happy it works for you again
great job @davidcassany
Problem description
I boot a live ISO stored on the hard drive, since update to 5.8, the ISO no longer boots, not finding the root device. With kernel 5.7.11-1.2 the image works.
Steps to reproduce the behaviour
With an up-to-date tumbleweed build machine: git clone https://github.com/OSInside/kiwi-descriptions kiwi --type iso system build --description kiwi-descriptions/suse/x86_64/suse-tumbleweed --target-dir ./iso
The grub menu is: menuentry "test tumbleweed" --unrestricted{ set isofile="/iso/tumbleweed.x86_64-1.99.1.iso" loopback loop (hd0,3)$isofile linux (loop)/boot/x86_64/loader/linux iso-scan/filename=$isofile isofrom_device=/dev/disk/by-uuid/a0b77825-d1f7-4a6a-ac82-b2e494339ba8 root=live:CDLABEL=CDROM rd.live.image net.ifnames=0 loglevel=3 initrd (loop)/boot/x86_64/loader/initrd }
Image does not boot, /dev/disk/by-label does not exist, see attached rdsosreport.txt rdsosreport.txt
OS and Software information
Latest tumbleweed with kernel 5.8.14 & kiwi-tools-9.21.14-1.1.x86_64