Open ianb-mp opened 9 months ago
Can you reproduce this with official FCOS images? If so can you provide the full filename of the qcow2 you are using where you are seeing this problem?
Oh, I see you are using PXE. In that case maybe filenames and sha256sum of the kernel initrd and rootfs files.
Can you reproduce this with official FCOS images? If so can you provide the full filename of the qcow2 you are using where you are seeing this problem?
AFAIK the images I'm using are official FCOS images, downloaded from here. The qcow2 image is on the VM host, located at: /data/var/libvirt/images/ianb-okd-bs.qcow2
sha256sums:
e602806b53e09a898079eaf5e216cc610dde19e8a3f8fb6598aae02286203cd7 fedora-coreos-38.20231002.3.1-live-initramfs.x86_64.img
af72f61a3571dca6515182320b37afedf881d5c51e1c2b07be39f227849d8bd8 fedora-coreos-38.20231002.3.1-live-kernel-x86_64
b80c63e56bdf19a42d0a6aa8062577ac5dd6cad7cff7e47aa2cc5d1d589589ee fedora-coreos-38.20231002.3.1-live-rootfs.x86_64.img
I've also tried using latest v39 images, but saw the same issue:
e50a45a3da2face2a808cfccc3318f7d0a9ad8d04c3845f798dc19646bcf4138 fedora-coreos-39.20231119.3.0-live-initramfs.x86_64.img
3efd416a625571345cbb330b434272b99813a1146cc9b4eaf7bfc8e29317c986 fedora-coreos-39.20231119.3.0-live-kernel-x86_64
a7bbd175100d567cd7e27130834990a837b49bf3181d41106b9dd967d1c1058d fedora-coreos-39.20231119.3.0-live-rootfs.x86_64.img
I've been looking at the differences between the fedora-coreos-stable
and rhl9
os variants, and noticed that rhl9 uses e1000
driver by default. I've tested using fedora-coreos-stable
variant but setting e1000
for the NIC, and it works i.e.
--network bridge=br0,model=e1000
Furthermore, after using e1000
for the installation, I can shutdown the VM, swap back to virtio
and boots CoreOS and network is working. So the question is why can't CoreOS make use of virtio
model NIC during the installation phase?
As I said earlier, the virtio
NIC does work when I manually configure it from the emergency shell cli e.g.
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:88:df:2f brd ff:ff:ff:ff:ff:ff
$ ip link set enp1s0 up
$ ip addr add 10.8.55.101/24 dev enp1s0
$ ip route add default via 10.8.55.1
$ curl -I http://10.2.3.2
HTTP/1.1 200 OK
It's possible that there is some device initialization that is taking longer for one versus the other. Getting more logs of early boot would help. I would add rd.debug
to the kernel command line which will put NetworkManager into debug mode in the initramfs and then compare the output between the e1000
and the virtio
cases to see if there's anything to glean. Unfortunately rd.debug
will produce a ton of output so getting down to the NetworkManager logs might be tricky. Good luck!
Here are the two boot logs with rd.debug
enabled:
journalctl_e1000.txt - successful boot
rdsosreport_virtio.txt - failed boot
I've cropped timestamp prefix from each file to make it more comparable with diff. I had a look but couldn't see anything conclusive. Hopefully someone with more experience in these things will be able to spot the problem!
Describe the bug
CoreOS fails to boot under some conditions, depending on libvirt configuration. When it fails to boot, it drops to emergency shell. I see this error in
rdsosreport.txt
(full report here rdsosreport.txt)The network interface is down. I am able to configure the network interface manually from the emergency shell cli and send/receive traffic. I can mount the disk and read/write to it. So VM hardware seems fine.
Reproduction steps
Launch a new VM with these parameters:
Expected behavior
VM should boot normally without error.
Actual behavior
VM fails to boot and drops to emergency shell as described earlier.
I am able to boot successfully if I replace
--os-variant fedora-coreos-stable
with--os-variant rhl9
. I've attached xmldumps for both versions below in 'additional information'.I've tested on two separate VM hosts with different OS, hardware etc and the same issue occurred on both.
System details
Bare metal host, kvm/qemu guest VM
VM is booted via PXE boot)
VM Guest kernel cmdline:
VM Host environment:
OS: Rocky Linux 9.3
Butane or Ignition config
No response
Additional information
xmldump - fedora-coreos-stable
```xmlxmldump - rhl9
```xml