Open miczyg1 opened 2 years ago
this fixed it in my tests
https://github.com/Dasharo/coreboot/pull/116/commits/65ed5e1d015cbc0f7729f6103fc87b4d03a63b64
@mrothfuss unfortunately Your fix doesn't work for us, platform still reboots on first access to any of the PCI devices created by second CPU after scrubbing is enabled. We also haven't seen negative DQS recovery delay detected!
on our platform, so it is probably a different issue.
With scrubbing disabled, it reboots soon after starting Linux, before anything is printed by it.
Darn. I was hoping to contribute something.
In case it helps, I was testing with dual 6386's using the latest ucode. Dasharo would loop on ram training similar to what miczyg reported, but could eventually succeed. Dasharo+patch has worked fine under this setup (no boot issues or runtime instability).
I was able to boot a D16 ROM provided by 3mdeb with dual CPUs without issue (hardware). I'm betting it has something to do with memory training. I did a partial audit of the memory initialization code comparing to the BKDG ... there are deviations. Looking at DQS timing results across many boots, some lanes were consistent while others had a wide distribution of values.
This is probably related to #47; having faulty raminit as the underlying problem.
AGESA Fam15 code suggests that seeds for DQS Receiver Enable Training should be extensively determined for each motherboard. Seeds can be configured uniquely for every possible socket, channel, dimm, and byte lane combination. The raptor raminit code is only using the recommended seeds from Table 99 of the BKDG.
I am using an alternate AGESA algorithm: a "Seedless" training method that does not require configuration. So far it has performed flawlessly on two boards (1xC32 + 4x32GB, 2xG34 + 16x32GB). It looks like this algorithm is designed to determine the seeds to allow proper configuration of the "normal" training method -- which I assume is faster at runtime.
See MemTRdPosWithRxEnDlySeeds3() in vendorcode/amd/agesa/f15/Proc/Mem/Tech/mtthrcSeedTrain.c for details.
Another detail to be aware of: the raptor raminit code deviates from the BKDG and does not perform multiple passes of memory training according to the specification.
Dasharo version Dasharo for KGPE-D16
Dasharo variant KGPE-D16
Affected component(s) or functionality coreboot boot process
Brief summary coreboot resets during ECC memory initialization when two CPU sockets are populated
How reproducible Always
How to reproduce
Steps to reproduce the behavior:
Expected behavior The platform can boot to OS
Actual behavior The platform does not boot
Screenshots None
Additional context Maybe check the CMOS options and its default values? SOme scrubber settings may be off etc.
Solutions you've tried None