canonical / checkbox

Checkbox is a testing framework used to validate device compatibility with Ubuntu Linux. It’s the testing tool developed for the purposes of the Ubuntu Certification program.
https://checkbox.readthedocs.io
GNU General Public License v3.0
33 stars 50 forks source link

iwlwifi_microcode_crash should only check journal log for boots in which the test session is run #1461

Open kevinyehk opened 2 months ago

kevinyehk commented 2 months ago

Enhancement Proposal

The iwlwifi_microcode_crash checks all the journal logs on the DUT, but when the DUT is noprovision, it might cause false alarm, as the microcode carsh event could have occurred in previous kernel.
Therefore, I think we should modify the test to only check the target kernel/snap that is under test.

syncronize-issues-to-jira[bot] commented 2 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CHECKBOX-1558.

This message was autogenerated

anthonywong commented 2 months ago

Can we stop reporting test failures like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077537 until this bug is fixed?

KaiChuan-Hsieh commented 2 months ago

@kevinyehk Do you think to clear old journal log before running the SRU test plan can avoid the situation?

kevinyehk commented 1 month ago

@KaiChuan-Hsieh yes, clear old log in the beginning of entire test session should be OK.

boukeas commented 1 month ago

I vacuumed the journal on hp-elitebook-650-156-inch-g10-c31123, which is a noprovision device, removing anything older than two days old, and reran the SRU test in which only the iwlwifi_microcode_crash test was previously failing. I can confirm there were no failures.

kevinyehk commented 1 month ago

if we check the kernel version at the first line in the log matches the one running now instead of cleaning all the journal log. maybe it's a better way?

    bootidx=$(echo "$line" | cut -d " " -f1)
    kernel=$(journalctl -k -b "$bootidx" | grep -oP 'Linux version \d+\.\d+\.\d+-\d+-.*' | cut -d " " -f 3 )
    if [ "$kernel" = "$(uname -r)" ]; then
            echo "checking microcode error...."
            if journalctl -k -b "$bootidx" | grep -q "Microcode SW error detected"; then
                    echo "Boot $line, Microcode SW error detected"
                    exit 1
            fi
    else
            echo "skip...."
    fi