Linaro / lite-lava-docker-compose

LITE Team LAVA docker dispatcher
MIT License
5 stars 10 forks source link

Pattern used for start message detection in Zephyr test harness is no more valid #116

Closed erwango closed 3 years ago

erwango commented 3 years ago

It appears that zephyr test harness has evolved 'recently'.

It used to be something like:

*** Booting Zephyr OS build zephyr-v2.3.0-499-g72ebf70543fc  ***
Running test suite test_log_list
===================================================================
starting test - test_log_list
PASS - test_log_list
===================================================================
starting test - test_log_list_multiple_items
PASS - test_log_list_multiple_items
===================================================================
Test suite test_log_list succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL  

Now, same test throws:

*** Booting Zephyr OS build v2.4.0-rc1-22-g3bcb6817e6fd  ***                    
Running test suite test_log_list                                                
===================================================================             
START - test_log_list                                                           
 PASS - test_log_list                                                           
===================================================================             
START - test_log_list_multiple_items                                            
 PASS - test_log_list_multiple_items                                            
===================================================================             
Test suite test_log_list succeeded                                              
===================================================================             
PROJECT EXECUTION SUCCESSFUL 

The string starting test - test_log_list disappeared. Issue is that it was used as pattern for test start detection: Waiting for start message: (tc_start\(\)|starting .*test|BOOTING ZEPHYR OS)

As a consequence, tests run with a recent Zephyr are all failing.

erwango commented 3 years ago

^^@pfalcon

pfalcon commented 3 years ago

Some more details to reproduce:

With this repo, run: lavacli -i lava-docker jobs submit example/lava.job

***** Booting Zephyr OS v1.14.0-rc3-56-g193fb971c24f *****
Running test suite test_log_list
===================================================================
starting test - test_log_list
PASS - test_log_list
...

Now, in Zephyr tree (master 1f18b9c7e25c7277), build tests/subsys/logging/log_list for BOARD=frdm_k64f. Replace test-images/frdm_k64f/logging.log_list/zephyr.bin with the new binary from the built test. And run lavacli -i lava-docker jobs submit example/lava.job again:

Waiting for start message: (tc_start\(\)|starting .*test|BOOTING ZEPHYR OS)
Setting prompt string to '(tc_start\\(\\)|starting .*test|BOOTING ZEPHYR OS)'
Trying 172.19.0.4...
Connected to ser2net.
Escape character is '^]'.
*** Booting Zephyr OS build zephyr-v2.3.0-2744-g1f18b9c7e25c  ***
Running test suite test_log_list
===================================================================
START - test_log_list
 PASS - test_log_list
===================================================================
START - test_log_list_multiple_items
 PASS - test_log_list_multiple_items
===================================================================
Test suite test_log_list succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL
wait for prompt timed out
end: 3.1 lava-test-monitor (duration 00:00:30) [common]
case: lava-test-monitor
definition: lava
duration: 30.10
extra: ...
level: 3.1
namespace: common
result: fail
lava-test-monitor-retry failed: 1 of 1 attempts. 'wait for prompt timed out'
pfalcon commented 3 years ago

So, we'll need to update test patterns, but bigger question is what's up with zephyr-upstream test in CI - based on SQUAD notifications, I didn't see any catastrophic test regressions, so would need to look there too.

pfalcon commented 3 years ago

Ok, so here's this specific test in production CI: https://validation.linaro.org/scheduler/job/2127422 . Here it looks ok, so apparently we use different patterns there (Vincent fixed some mis-matching a couple of months ago).

But before I arrived at that, I saw this: https://validation.linaro.org/scheduler/job/2127812 . Here it's clearly not right, as testcase matching ends prematurely. But on closer look, there's some problem with output, looks like older test's output is still in the input buffer, or 2 outputs are interspersed, or something. And that's on frdm_kw41z which is known to be not very stable. And well, I have suspicion that LAVA lab USB/UART routing infra is susceptible to glitches like that, but didn't have a chance to concentrate on it and dig to the bottom.

pfalcon commented 3 years ago

All in all, I'll work on both updating our sample and making it consistent with production patterns.

pfalcon commented 3 years ago

Fixed by 2 PRs above.