coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
330 stars 165 forks source link

tests: add a remote kdump test #3829

Closed jbtrystram closed 2 days ago

jbtrystram commented 1 week ago

This test setups two machines to test if kdump successfully exports vmcore to a SSH destination.

Fixes https://github.com/coreos/fedora-coreos-tracker/issues/1753


This is not fully functional yet but ready for review I think. The test setup works ( i can see the logs created on the remote machine) but kola fails the test somehow :

[coreos-assembler]$ /mnt/cosa/bin/kola run kdump.crash.ssh --ssh-on-test-failure 
⏭  Skipping kola test pattern "fcos.internet":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
⏭  Skipping kola test pattern "podman.workflow":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
=== RUN   kdump.crash.ssh
/home/core/crash/10.0.2.15-2024-07-04-15:55:25/vmcore-dmesg.txt
/home/core/crash/10.0.2.15-2024-07-04-15:55:25/vmcore.flat
--- FAIL: kdump.crash.ssh (105.02s)
) on machine cac011c1-b98c-499a-9116-ba14fed5e45f consolered crash
FAIL, output in tmp/kola/qemu-2024-07-04-1554-32683
Error: harness: test suite failed
2024-07-04T15:56:09Z cli: harness: test suite failed

is there some flag to say the machine is expected to reboot ?

openshift-ci[bot] commented 1 week ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

travier commented 1 week ago

is there some flag to say the machine is expected to reboot ?

There is one to say that the VM is expected to crash: https://github.com/coreos/fedora-coreos-config/blob/testing-devel/tests/kola/kdump/crash/test.sh#L7

tags: skip-base-checks

You can likely find the corresponding option in the code to set that for this test.

jbtrystram commented 3 days ago

tags: skip-base-checks

yep that was it. Thanks @travier !

jbtrystram commented 3 days ago

I think this is now ready for review I added a couple of retry loops to let some time for kdump to generate the initramfs and write the logs.

The test now pass :

[coreos-assembler]$ /mnt/bin/kola run kdump.crash.ssh
=== RUN   kdump.crash.ssh
--- PASS: kdump.crash.ssh (68.66s)
PASS, output in tmp/kola/qemu-2024-07-08-2208-10301
jbtrystram commented 3 days ago

the RHCOS test fails on the SSH command to crash the kernel. I suspect it's not the latest revision that was tested. Anyway, the kdump work:

[    6.910879] kdump[559]: Kdump is using the default log level(3).
[    7.136098] kdump[605]: saving to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44
[    8.007655] kdump[605]: saving to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44
[    7.317830] kdump[609]: saving vmcore-dmesg.txt to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44
[    8.189387] kdump[609]: saving vmcore-dmesg.txt to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44
[    7.486731] kdump.sh[611]: 159+1 records in
[    7.490292] kdump.sh[611]: 159+1 records out
[    7.494151] kdump.sh[611]: 81809 bytes (82 kB, 80 KiB) copied, 0.000764738 s, 107 MB/s[    8.358288] kdump.sh[611]: 159+1 records in

[    8.361851] kdump.sh[611]: 159+1 records out
[    8.365709] kdump.sh[611]: 81809 bytes (82 kB, 80 KiB) copied, 0.000764738 s, 107 MB/s
[    7.647105] kdump[614]: saving vmcore-dmesg.txt complete
[    8.518662] kdump[614]: saving vmcore-dmesg.txt complete
[    7.657912] kdump[616]: saving vmcore
[    8.529470] kdump[616]: saving vmcore
[    9.665279] kdump.sh[617]: 
Checking for memory holes                         : [  0.0 %] /                  
Checking for memory holes                         : [100.0 %] |                  
Excluding unnecessary pages                       : [100.0 %] \                  
Checking for memory holes                         : [100.0 %] -                  
Checking for memory holes                         : [100.0 %] /                  
Excluding unnecessary pages                       : [100.0 %] |                  
Copying data                                      : [ 36.7 %] \           eta: 2s
Copying data                                      : [ 96.5 %] -           eta: 0s
Copying data                                      : [100.0 %] /           eta: 0s
Copying data                                      : [100.0 %] |           eta: 0s
[   10.536834] kdump.sh[617]: 
Checking for memory holes                         : [  0.0 %] /                  
Checking for memory holes                         : [100.0 %] |                  
Excluding unnecessary pages                       : [100.0 %] \                  
Checking for memory holes                         : [100.0 %] -                  
Checking for memory holes                         : [100.0 %] /                  
Excluding unnecessary pages                       : [100.0 %] |                  
Copying data                                      : [ 36.7 %] \           eta: 2s
Copying data                                      : [ 96.5 %] -           eta: 0s
Copying data                                      : [100.0 %] /           eta: 0s
Copying data                                      : [100.0 %] |           eta: 0s
[    9.734010] kdump.sh[618]: 114846+1727 records in
[    9.737953] kdump.sh[618]: 114846+1727 records out
[    9.741836] kdump.sh[618]: 59162607 bytes (59 MB, 56 MiB) copied, 1.9093 s, 31.0 MB/s
[   10.605567] kdump.sh[618]: 114846+1727 records in
[   10.609512] kdump.sh[618]: 114846+1727 records out
[   10.613395] kdump.sh[618]: 59162607 bytes (59 MB, 56 MiB) copied, 1.9093 s, 31.0 MB/s
[    9.900371] kdump[621]: saving vmcore complete
[   10.771928] kdump[621]: saving vmcore complete
[    9.913585] kdump[623]: saving the /run/initramfs/kexec-dmesg.log to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44/
[   10.785086] kdump[623]: saving the /run/initramfs/kexec-dmesg.log to core@10.0.2.2:/home/core/crash/10.0.2.15-2024-07-08-23:59:44/
[   10.109477] kdump[630]: Executing final action systemctl reboot -f
[   10.981034] kdump[630]: Executing final action systemctl reboot -f
[   10.126948] systemd[1]: Shutting down.
[   10.998506] systemd[1]: Shutting down.