Open wiktormowinski opened 3 months ago
Hard to say what may go wrong. It needs further investigation. Especially that it can be hard to reproduce, e.g. after 1h
After the initial fixes to the fw, focusing other issues, seems like this one got fixed too. You can boot to ubuntu and it doesn't soft lock after a minute (which was a common occurance if not quicker)
It used to be pretty much unbearable, allowing the platform to be used for like 30sec at most. After building from the byt_fsp_parity branch however, it hasn't occurred for a while of normal use. I will run lengthy performance/stability tests today to confirm whether this has been completely resolved.
EDIT: that's only true for the SB binary. It still persists after building the non-SB config.
thanks for confirming
The issue does not happen in a deterministic way. Sometimes the CPU soft-locks when the system is booting, sometimes when it is running for a couple of seconds and minutes. Printing cbmem console or dmesg on serial console helps with triggering the issue a little bit faster if it doesn't happen right off the bat. Some platforms were not affected by the issue (mainly quad core platforms).
I have analyzed the Bay Trail FSP source and compared it against Bay Trail native silicon init in coreboot and haven't found any major problems. A couple of things caught my eye regarding CPU P/C states, which I fixed per BWG, however, it didn't help. The work is on WIP PR: https://github.com/Dasharo/coreboot/pull/575
Now that I am thinking about it, maybe it is some issue with C6 state and C6 DRAM which ought to be reserved for it. That would imply some difference in MRC binary and FSP memory init.
coreboot 4.11 (lastest version which still had FSP baytrail support) did not have the problem, so it may be related to the MRC bin not doing something what should be done.
Component
Dasharo firmware
Device
other
Dasharo version
v0.9.0-rc1
Dasharo Tools Suite version
No response
Test case ID
No response
Brief summary
sometimes Minnowboard gets really slow or even freezes what is indicated by watchdog reporting that CPU is stuck for X seconds
How reproducible
rare about 30% of tests should get it during regression
How to reproduce
for auto: run absolutely any automated test suite, but for a near 100% fail chance i suggest
CPF002.001
fromdasharo-performance
this is because cpf002.001 lasts 1h and during that time the softlock can occur at any time (most often happens around ~15min mark)for manual:
Expected behavior
the tests should continue uninterrupted
Actual behavior
instead the tests get either
watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [systemd-udevd:120]
though I am almost certain both of these stem from the very same problem
Screenshots
No response
Additional context
my cpf002.001 attempts documented: faile.zip
Solutions you've tried
No response