coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

[rawhide][ppc64le] ext.config.kdump.crash failure #1698

Closed jbtrystram closed 2 months ago

jbtrystram commented 3 months ago

relevant Console output:

[    4.583671] systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service...
[    4.614645] kdump[462]: Kdump is using the default log level(3).
[    4.668561] kdump[497]: saving to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-03-24-13:37:43/
[    4.679853] kdump[502]: saving vmcore-dmesg.txt to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-03-24-13:37:43/
[    4.694130] kdump[508]: saving vmcore-dmesg.txt complete
[    4.695729] kdump[510]: saving vmcore
[    4.710297] kdump.sh[511]: 
Checking for memory holes                         : [  0.0 %] /                  
Checking for memory holes                         : [100.0 %] |                  readpage_elf: Attempt to read non-existent page at 0xc000000000000.
[    4.710664] kdump.sh[511]: readmem: type_addr: 0, addr:c00c000000000000, size:16384
[    4.710795] kdump.sh[511]: __exclude_unnecessary_pages: Can't read the buffer of struct page.
[    4.710933] kdump.sh[511]: create_2nd_bitmap: Can't exclude unnecessary pages.
[    4.713611] kdump.sh[511]: The kernel version is not supported.
[    4.713745] kdump.sh[511]: The makedumpfile operation may be incomplete.
[    4.713863] kdump.sh[511]: makedumpfile Failed.
[    4.715342] kdump[513]: saving vmcore failed, exitcode:1
[    4.716640] kdump[515]: saving vmcore failed
[    4.731037] kdump[520]: saving the /run/initramfs/kexec-dmesg.log to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2024-03-24-13:37:43///
[    4.733652] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE
[    4.733887] systemd[1]: kdump-capture.service: Failed with result 'exit-code'.

test log :

Mar 24 13:36:48 qemu0 kola-runext-test.sh[5465]: ++ kdumpctl estimate
Mar 24 13:36:49 qemu0 kola-runext-test.sh[5468]: kdump: Detected change in File System
Mar 24 13:36:49 qemu0 kola-runext-test.sh[5468]: kdump: Rebuilding /var/lib/kdump/initramfs-6.9.0-0.rc0.20240322git8e938e398669.14.fc41.ppc64lekdump.img
Mar 24 13:36:52 qemu0 kola-runext-test.sh[7563]: grep: /var/tmp/dracut.vQUkwN/initramfs/etc/systemd/system.conf*: No such file or directory
Mar 24 13:37:31 qemu0 kola-runext-test.sh[9101]: tail: error writing 'standard output': Broken pipe
Mar 24 13:37:32 qemu0 kola-runext-test.sh[9119]: tail: error writing 'standard output': Broken pipe
Mar 24 13:37:32 qemu0 kola-runext-test.sh[9124]: tail: error writing 'standard output': Broken pipe
Mar 24 13:37:32 qemu0 kola-runext-test.sh[9129]: tail: error writing 'standard output': Broken pipe
Mar 24 13:37:32 qemu0 kola-runext-test.sh[9134]: tail: error writing 'standard output': Broken pipe
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: + output='Reserved crashkernel:    512M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Recommended crashkernel: 512M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Kernel image size:   0M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Kernel modules size: 13M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Initramfs size:      68M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Runtime reservation: 64M
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: Large modules:
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]:     xfs: 2752512
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]:     sunrpc: 1048576'
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: + grep -q 'WARNING: Current crashkernel size is lower than recommended size'
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: + /tmp/autopkgtest-reboot-prepare aftercrash
Mar 24 13:37:32 qemu0 kola-runext-test.sh[5322]: + sleep 5
Mar 24 13:37:37 qemu0 kola-runext-test.sh[5322]: + echo 'Triggering sysrq'
Mar 24 13:37:37 qemu0 kola-runext-test.sh[5322]: Triggering sysrq
Mar 24 13:37:37 qemu0 kola-runext-test.sh[5322]: + sync
-- Boot ee11846cf9a544cb9d1d8eede6a27136 --
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + . /var/opt/kola/extdata/commonlib.sh
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: ++ IFS=' '
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: ++ read -r -a cmdline
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + case "${AUTOPKGTEST_REBOOT_MARK:-}" in
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2636]: ++ find /var/crash -type f -name vmcore
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + kcore=
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + test -z ''
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + fatal 'No kcore found in /var/crash'
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + echo 'No kcore found in /var/crash'
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: No kcore found in /var/crash
Mar 24 13:38:26 qemu0 kola-runext-test.sh[2626]: + exit 1

How hard would it be to save kexec-dmesg.log from the test run ?

jbtrystram commented 3 months ago

Looks like this in a known upstream issue https://bugzilla.redhat.com/show_bug.cgi?id=2269991 So i'll denylist this test until it's fixed

jbtrystram commented 3 months ago

Denylist PR : https://github.com/coreos/fedora-coreos-config/pull/2922

dustymabe commented 3 months ago

supposedly fixed by crash-8.0.4-5.fc41. @jbtrystram can you confirm and then remove the denylist entry and and close this out?

jbtrystram commented 3 months ago

@dustymabe testing this on a PPC64le machine today still fails. crash is not included in fcos by default, and the console log shows an issue with makedumpfile. If i understand correctly, crash is a tool you would use to analyse the dumpfiles.

I did some digging and I think the issue is that this makedumpfile patch is not backported into kexec-tools (which packages makedumpfile) :

makedumpfile -v
makedumpfile: version 1.7.4 (released on 6 Nov 2023)
lzo     enabled
snappy  enabled
zstd    enabled

See: https://bugzilla.redhat.com/show_bug.cgi?id=2269991#c7

@coiby could you backport the makedumpfile fix to kexec-tools ?

Another fix is to pin the kernel to a prior version than v6.9-rc2

coiby commented 2 months ago

Hi @jbtrystram, kexec-tools-2.0.28-7.fc41 now includes the makedumpfile fix, thanks for the reminder!

jbtrystram commented 2 months ago

https://github.com/coreos/fedora-coreos-config/pull/2966

dustymabe commented 2 months ago

The snooze for this was dropped in https://github.com/coreos/fedora-coreos-config/commit/7857871f47624e5fe510bc3c37258bb1f667e7e5

All should be good now.