coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

coreos.boot-mirror.luks kola test failing for ppc64le #1659

Open ravanelli opened 9 months ago

ravanelli commented 9 months ago

There is a Kernel panic going on

21:48:26  === RUN   coreos.boot-mirror.luks
21:48:52  qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
21:48:52  Falling back to kernel-irqchip=off

22:09:30  === RUN   coreos.boot-mirror.luks/sanity-check
22:09:30  qemu-system-ppc64: OS terminated: 
22:19:37  --- FAIL: coreos.boot-mirror.luks (1024.03s)
22:19:37  --- PASS: coreos.boot-mirror.luks/sanity-check (1.89s)
22:19:37  luks.go:77: Failed to reboot the machine: machine "33fec3b6-b69c-42fa-bfe7-6999858291d8" failed to start: ssh journalctl failed: time limit exceeded
22:19:37   harness.go:1736: Found kernel panic (Attempted to kill init! exitcode=0x0000000b
) on machine 33fec3b6-b69c-42fa-bfe7-6999858291d8 console
22:19:37   harness.go:1736: Found kernel oops on machine 33fec3b6-b69c-42fa-bfe7-6999858291d8 console
22:19:43  qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
22:19:43  Falling back to kernel-irqchip=off
22:26:05  qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
22:26:05  Falling back to kernel-irqchip=off

-----
Error
Test failed
Stacktrace
--- PASS: coreos.boot-mirror.luks/sanity-check (1.89s)
        luks.go:77: Failed to reboot the machine: machine "33fec3b6-b69c-42fa-bfe7-6999858291d8" failed to start: ssh journalctl failed: time limit exceeded
        harness.go:1736: Found kernel panic (Attempted to kill init! exitcode=0x0000000b) on machine 33fec3b6-b69c-42fa-bfe7-6999858291d8 console
        harness.go:1736: Found kernel oops on machine 33fec3b6-b69c-42fa-bfe7-6999858291d8 console
gnition: wrote ssh authorized keys file for user: core
qemu0 login: [  392.788309] watchdog: watchdog0: watchdog did not stop!
[  393.178396] BUG: Unable to handle kernel data access at 0x5deadbeef0000100
[  393.178538] Faulting instruction address: 0xc000000000f587d4
[  393.178642] Oops: Kernel access of bad area, sig: 11 [#1]
[  393.178741] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  393.178847] Modules linked in: rfkill crct10dif_vpmsum sunrpc binfmt_misc dm_crypt raid1 xfs zram virtio_net net_failover vmx_crypto crc32c_vpmsum pseries_wdt failover virtio_blk dm_multipath be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse virtio_console
[  393.179433] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.8.0-0.rc0.20240112git70d201a40823.5.fc40.ppc64le #1
[  393.179593] Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1203 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
[  393.179751] NIP:  c000000000f587d4 LR: c000000000f587c8 CTR: c0000000001ccf68
[  393.179872] REGS: c0000000086eb6d0 TRAP: 0380   Not tainted  (6.8.0-0.rc0.20240112git70d201a40823.5.fc40.ppc64le)
[  393.180031] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44002204  XER: 00000092
[  393.180188] CFAR: c000000000f5871c IRQMASK: 0 
[  393.180188] GPR00: c000000000f587c8 c0000000086eb970 c000000001f9be00 c000000002ca9250 
[  393.180188] GPR04: c000000002ca9250 0000000000000001 0000000000000001 fffffffffffe0000 
[  393.180188] GPR08: 0000000000000001 0000000000000001 5deadbeef0000100 0000000000002000 
[  393.180188] GPR12: 0000000000000000 c000000002cd0000 0000000000000000 0000000000000000 
[  393.180188] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  393.180188] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  393.180188] GPR24: 0000000000000002 0000000000000000 c000000002ca9078 c0000000029a1260 
[  393.180188] GPR28: c000000002ca9250 5deadbeeeffffce8 c0000001b4b30238 c0000001b4b3d000 
[  393.181229] NIP [c000000000f587d4] md_notify_reboot+0x154/0x250
[  393.181339] LR [c000000000f587c8] md_notify_reboot+0x148/0x250
[  393.181448] Call Trace:
[  393.181492] [c0000000086eb970] [c000000000f587a0] md_notify_reboot+0x120/0x250 (unreliable)
[  393.181619] [c0000000086eb9d0] [c00000000019eec0] notifier_call_chain+0xb8/0x19c
[  393.181747] [c0000000086eba30] [c00000000019f188] blocking_notifier_call_chain+0x64/0x94
[  393.181874] [c0000000086eba70] [c0000000001a2cd8] kernel_restart+0x38/0x10c
[  393.181980] [c0000000086ebae0] [c0000000001a3150] __do_sys_reboot+0x12c/0x2dc
[  393.182105] [c0000000086ebc40] [c00000000002f1b4] system_call_exception+0x174/0x320
[  393.182231] [c0000000086ebe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
[  393.182358] --- interrupt: 3000 at 0x7fff8bb5d2f8
[  393.182442] NIP:  00007fff8bb5d2f8 LR: 0000000000000000 CTR: 0000000000000000
[  393.182563] REGS: c0000000086ebe80 TRAP: 3000   Not tainted  (6.8.0-0.rc0.20240112git70d201a40823.5.fc40.ppc64le)
[  393.182721] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48002403  XER: 00000000
[  393.182878] IRQMASK: 0 
[  393.182878] GPR00: 0000000000000058 00007ffff31df800 00007fff8bc76d00 fffffffffee1dead 
[  393.182878] GPR04: 0000000028121969 0000000001234567 0000000000003a5d 00007fff8c1f6038 
[  393.182878] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  393.182878] GPR12: 0000000000000000 00007fff8c4f3980 0000000000000001 0000000000000000 
[  393.182878] GPR16: 00007ffff31df910 0000000000000000 0000000135fc14c8 0000000000000000 
[  393.182878] GPR20: 0000000135fc1d98 0000000135fc15b8 00007ffff31df8f8 0000000000000001 
[  393.182878] GPR24: 0000000000000003 00007ffff31df930 0000000000000000 0000000000000000 
[  393.182878] GPR28: 00007ffff31dff98 00007ffff31df918 00007ffff31df908 0000000000000000 
[  393.183854] NIP [00007fff8bb5d2f8] 0x7fff8bb5d2f8
[  393.183937] LR [0000000000000000] 0x0
[  393.184002] --- interrupt: 3000
[  393.184065] Code: 7f84e378 48430ee1 60000000 2c030000 4182000c 7fe3fb78 4bff0a19 7f83e378 48473d5d 60000000 39000001 395d0418 <e93d0418> 7fbfeb78 7c2ad800 3929fbe8 
[  393.184315] ---[ end trace 0000000000000000 ]---
[  393.200752] pstore: backend (nvram) writing error (-1)
[  393.200861] 
[  394.200907] note: systemd-shutdow[1] exited with irqs disabled
[  394.201108] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
dustymabe commented 9 months ago

I think this is a flake we've seen some times and not specific to OSBuild. A run today got past the kola stage without this failure.

jbtrystram commented 2 months ago

These tests failed today on branched and rawhide for PPC64LE. same logs as above

dustymabe commented 2 months ago

observed in the bump-lockfile job from yesterday.