canonical / checkbox

Checkbox
https://checkbox.readthedocs.io
GNU General Public License v3.0
30 stars 47 forks source link

iwlwifi_microcode_crash should only check journal log for boots in which the test session is run #1461

Open kevinyehk opened 1 month ago

kevinyehk commented 1 month ago

Enhancement Proposal

The iwlwifi_microcode_crash checks all the journal logs on the DUT, but when the DUT is noprovision, it might cause false alarm, as the microcode carsh event could have occurred in previous kernel.
Therefore, I think we should modify the test to only check the target kernel/snap that is under test.

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1558.

This message was autogenerated

anthonywong commented 3 weeks ago

Can we stop reporting test failures like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077537 until this bug is fixed?

KaiChuan-Hsieh commented 3 weeks ago

@kevinyehk Do you think to clear old journal log before running the SRU test plan can avoid the situation?

kevinyehk commented 2 weeks ago

@KaiChuan-Hsieh yes, clear old log in the beginning of entire test session should be OK.

boukeas commented 2 weeks ago

I vacuumed the journal on hp-elitebook-650-156-inch-g10-c31123, which is a noprovision device, removing anything older than two days old, and reran the SRU test in which only the iwlwifi_microcode_crash test was previously failing. I can confirm there were no failures.

kevinyehk commented 2 weeks ago

if we check the kernel version at the first line in the log matches the one running now instead of cleaning all the journal log. maybe it's a better way?

    bootidx=$(echo "$line" | cut -d " " -f1)
    kernel=$(journalctl -k -b "$bootidx" | grep -oP 'Linux version \d+\.\d+\.\d+-\d+-.*' | cut -d " " -f 3 )
    if [ "$kernel" = "$(uname -r)" ]; then
            echo "checking microcode error...."
            if journalctl -k -b "$bootidx" | grep -q "Microcode SW error detected"; then
                    echo "Boot $line, Microcode SW error detected"
                    exit 1
            fi
    else
            echo "skip...."
    fi