OpenAMP / openamp-demo

Demos and Docker images
0 stars 2 forks source link

Fix the qemu-zcu102 1% boot failure rate #7

Open wmamills opened 3 months ago

wmamills commented 3 months ago

With extensive testing it is found that the qemu-zcu102 configuration still has a 1% boot failure rate.

This is similar to the issue in v2022.12 that was worked around by using PREMPT_NONE and many other hacks. The time for such hacks is over so I did not repeat them in v2024.05.

The failure appears to be an RCU timeout while doing some sort of power domain operation. See test-logs/2024-07-23-ec2 in this repo for multiple logs and test run documentation.

The ZCU102 HW platform should be tested for 500 cycles to ensure this is not a common issue with v6.6 kernel and v2024.1 boot firmware. If the HW does not have the problem, then the qemu-zcu102 firmware upgrade (Issue #4 ) may solve the issue. Otherwise AMD will need to investigate.

wmamills commented 3 months ago

Assigned to me for now to do the first level investigatation.