HIL tests sometimes experience transient failures

jessebraham commented 2 months ago

There have been a number of instances how where the HIL testing has failed, only to pass in a subsequent run when re-added to the merge queue. One example of such failure can be seen here.

This is not the end of the world, but also is far from ideal; would be good to invest some time into seeing how/if we can make these tests more reliable.

SergioGasquez commented 2 months ago

I agree, we should try to log here all the transient failures of the HIL workflow.

For reference, the workflow that you linked failed because of the uart_async test on S3:

ERROR panicked at 'assertion failed: `(left == right)`'
diff < left / right >
<[72, 101, 108, 108, 0, 0, 0, 0, 0, 0, 0]
>[72, 101, 108, 108, 111, 32, 69, 83, 80, 51, 50]
└─ uart_async::tests::test_send_receive::{async_fn#0} @ /home/runner/work/esp-hal/esp-hal/hil-test/tests/uart_async.rs:66

No idea what failed in there, but it's a rare issue, tried to run the test locally around 20 times and all of them succeeded, also the workflow succeeds most of the runs...

bugadani commented 2 weeks ago

A lot of these have been caused by probe-rs inappropriately halting the CPU when it shouldn't have. Since we no longer poll RTT by default (https://github.com/esp-rs/esp-hal/pull/1960), can we close this?

esp-rs / esp-hal

HIL tests sometimes experience transient failures #1712