Dasharo / open-source-firmware-validation

OSFV infrastructure with automated tests and scripts for managing test results
Apache License 2.0
9 stars 2 forks source link

Power On on MSI z690 ddr5 not powering on reliably #603

Open philipandag opened 1 day ago

philipandag commented 1 day ago

Device

MSI z690 ddr5

RTE version

-

OSFV version

540-reset-to-defaults-restore-serial

Affected component(s) or functionality

dasharo-compatibility/cpu-core-count.robot: CCC test case. Most probably any other case that reboots the platform multiple times too.

Brief summary

The platform sometimes simply does not power on after the keyword

How reproducible

~50% on single tests. The whole CCC suite will surely experience at least one fail

How to reproduce

Run the CCC test suite

Expected behavior

The platform should always power on after the Power On KW as multiple tests rely on that

Actual behavior

It sometimes simply does not power on. In the failed test cases the platform is turned off after the Power On keyword, which was determined with the lack of video output on PiKVM. Powering it on manually using osfv_cli rte pwr on after the Power On keyword fails to do so results in the platform booting normally and the tests continuing.

Link to screenshots or logs

Logs from two runs of the CCC test: cpu-cores-count.robot_log.zip

Additional context

I have no clue why it wouldn't work looking at the implementation of the keyword in msi-z690-common. Maybe the sleep times are just too short?

Power On
    [Documentation]    Keyword clears telnet buffer and sets Device Under Test
    ...    into Power On state using RTE OC buffers. Implementation
    ...    must be compatible with the theory of operation of a
    ...    specific platform.
    Restore Initial DUT Connection Method
    IF    '${DUT_CONNECTION_METHOD}' == 'SSH'    RETURN
    Sleep    2s
    Rte Power Off    ${6}
    Sleep    5s
    # read the old output
    Telnet.Read
    Rte Power On

Solutions you've tried

No response

philipandag commented 1 day ago

I have re-run the suite increasing the timeouts in the Power On significantly (10s and 15s) and the whole suite has passed. Solving the issue is just a matter of choosing a less overshot sleep durations. cpu-cores-count.robot_log.zip

miczyg1 commented 3 hours ago

Looks related/similar: https://github.com/Dasharo/open-source-firmware-validation/issues/578