Closed jeremycline closed 5 years ago
@bgoncalv can you take a look at this? I can step in if needed
@johnbieren I might need your help, I was trying to reproduce this as the pipeline does, but it worked for me.
I ran these steps on privileged fedora:latest container after installing required packges...
mkdir /tmp/30423008
cd /tmp/30423008
koji download-task --arch=x86_64 --arch=noarch 30423008
createrepo .
cd
curl -LO http://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20181021.n.0/compose/Cloud/x86_64/images/Fedora-Cloud-Base-Rawhide-20181021.n.0.x86_64.qcow2
LIBGUESTFS_BACKEND=direct virt-copy-in -a Fedora-Cloud-Base-Rawhide-20181021.n.0.x86_64.qcow2 /tmp/30423008 /tmp
LIBGUESTFS_BACKEND=direct virt-customize -v --selinux-relabel --memsize 4096 -a Fedora-Cloud-Base-Rawhide-20181021.n.0.x86_64.qcow2 --run-command "yum install -y --best --allowerasing --nogpgcheck --enablerepo=30423008 --repofrompath=30423008,/tmp/30423008 kernel kernel-core kernel-debug kernel-debug-core kernel-debug-devel kernel-debug-modules kernel-debug-modules-extra kernel-debuginfo-common-x86_64 kernel-devel kernel-modules kernel-modules-extra"
It all worked well, I could see dracut ran as it shows on output:
yum install -y --best --allowerasing --nogpgcheck --enablerepo=30423008 --repofrompath=30423008,/tmp/30423008 kernel kernel-core kernel-debug kernel-debug-core kernel-debug-devel kernel-debug-modules kernel-debug-modules-extra kernel-debuginfo-common-x86_64 kernel-devel kernel-modules kernel-modules-extra
"
[ 242.304267] dracut[31241] No '/dev/log' or 'logger' included for syslog logging
[ 242.396189] dracut[31241] Executing: /usr/bin/dracut -f /boot/initramfs-4.19.0-1.fc30.x86_64.img 4.19.0-1.fc30.x86_64
[ 242.525169] dracut[31241] dracut module 'modsign' will not be installed, because command 'keyctl' could not be found!
[ 242.560393] dracut[31241] dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
[ 242.618348] dracut[31241] dracut module 'lvmmerge' will not be installed, because command 'lvm' could not be found!
...
When I boot this image it boots correctly with 4.19.0-1.fc30.x86_64
I tried to rebuild the failed build in the pipeline and it failed again with the issue reported. As you can see here, on pipeline dracut does not run: https://jenkins-continuous-infra.apps.ci.centos.org/job/fedora-rawhide-build-pipeline/1087/artifact/cloud-image-compose/logs/console.log
I just realized the pipeline uses centos:7 container, so I just ran the same steps using on it and again everything worked well, the server booted using correct kernel.
I'll check it out hopefully tomorrow
@bgoncalv So, I followed your steps inside of the exact OpenShift container that does this. I didn't catch if the dracut lines were there because the output came so quickly, but when I booted the VM:
[root@localhost ~]# rpm -qa kernel
kernel-4.19.0-1.fc30.x86_64
[root@localhost ~]# uname -r
4.19.0-0.rc8.git4.1.fc30.x86_64
[root@localhost ~]#
I don't know much about dracut, but I assume this means it did not run? Do you have any ideas how we can fix this? Maybe since the kernel is a special package we have some additional virt-customize command to run after installing the package to reboot it or change grub or something? I don't think it would be too bad to add a if statement just for kernel, it being the kernel. What do you think? Unfortunately, I don't know enough about how this works to have many ideas for the best remedy for this, but it for sure reproduces in OpenShift on the container.
I don't know much about dracut, but I assume this means it did not run?
Correct, dracut gets run as part of the kernel-install add
script, which is run in the post-transaction scriptlet in the kernel spec file.
I wonder if this problem is actually specific to the kernel, or if no post-transaction scriptlets are being run. That seems a little weird, though.
I was figuring kernel was the exception since you have to boot a kernel and don't just install a new version and run it immediately (again, I am far from an expert on it).
@bgoncalv Can we do like a if $package == kernel, run kernel-install add script after the install? Does that make sense functionally?
I was figuring kernel was the exception since you have to boot a kernel and don't just install a new version and run it immediately (again, I am far from an expert on it).
It works as it should for me locally (not in a container), and also (apparently) in the vanilla centos:7 container. That seems to indicate there's something weird going on due to either OpenShift being involved or something added to the environment here.
I think an if $package == kernel
is going to be fragile. What happens when something else is added to the post-transaction scriptlet? In addition to that, I'm concerned that it's just covering up the problem. Why aren't those rpm scripts being run?
As an aside, it'd be nice if there was way for someone who encounters a problem in a pipeline step to very easily get a reproducer script to create an identical environment.
I agree, this seems to be some problem with the Openshift, having $package == kernel
could be done for now as an workaround to have kernel tests running, but I'd like to understand why this is not working as it should on Openshift.
@jeremycline @johnbieren not sure what has changed, but it seems now the kernel gets installed properly:
The kernel CI test fails early on when it checks the kernel version. I then downloaded the qcow2 image, booted it up, and discovered it's not booting the newly installed kernel. It looks like the kernel's post transaction script (which runs
kernel-install add <kernel version>
, generating the initramfs and grub entry) isn't being run.I used virt-customize locally on the F28 cloud image to install the same kernel and it was properly installed, so it's not immediately obvious to me what's causing it to not happen in the CI environment.