Closed tlaurion closed 2 years ago
The bug is here. if (chips & (1 << chip))
is missing at least here.
@tlaurion As said on the call, we do not have single CPU setup right now, and it would be troublesome to switch back and forth.
Thanks for the report, it is very valuable. Maybe we can try with one or two iterations of fixes which you can test on your side. If that does not help, we might need to rearrange the lab or think of a more efficient way (maybe testing remotely on your hardware?)
it would be troublesome to switch back and forth.
I made function that returns mask of chips to return 0x01
to find the errors. The other issue is with OCC startup. I think commit with fix for OCC just got lost.
Ignoring second CPU isn't the same as installing only one, but it's a good approximation. We might want to make a build option for that or do something else that allows limiting number of CPUs.
@SergiiDmytruk Thanks.
@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170
For reference, my hostboot logs on same machine
--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--
3.07002|secure|SecureROM valid - enabling functionality
5.51144|Booting from SBE side 0 on master proc=00050000
5.54800|ISTEP 6. 5 - host_init_fsi
5.72083|ISTEP 6. 6 - host_set_ipl_parms
6.02733|ISTEP 6. 7 - host_discover_targets
8.98001|HWAS|PRESENT> DIMM[03]=2000000000000000
8.98002|HWAS|PRESENT> Proc[05]=8000000000000000
8.98003|HWAS|PRESENT> Core[07]=00003C0000000000
9.06120|ISTEP 6. 8 - host_update_master_tpm
9.55213|SECURE|Security Access Bit> 0x0000000000000000
9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
9.55228|ISTEP 6. 9 - host_gard
9.87348|HWAS|Deconfig HUID 0x0003000A, Physical:/Sys0/Node0/DIMM6
9.87352|HWAS|Deconfig HUID 0x0003000B, Physical:/Sys0/Node0/DIMM7
9.87467|HWAS|FUNCTIONAL> DIMM[03]=2000000000000000
9.87468|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
9.87469|HWAS|FUNCTIONAL> Core[07]=00003C0000000000
9.87780|ISTEP 6.11 - host_start_occ_xstop_handler
10.84372|ISTEP 6.12 - host_voltage_config
10.95588|ISTEP 7. 1 - mss_attr_cleanup
11.60999|ISTEP 7. 2 - mss_volt
11.82222|ISTEP 7. 3 - mss_freq
12.14141|ISTEP 7. 4 - mss_eff_config
12.82049|ISTEP 7. 5 - mss_attr_update
12.83176|ISTEP 8. 1 - host_slave_sbe_config
13.01644|ISTEP 8. 2 - host_setup_sbe
13.02951|ISTEP 8. 3 - host_cbs_start
13.03087|ISTEP 8. 4 - proc_check_slave_sbe_seeprom_complete
13.05204|ISTEP 8. 5 - host_attnlisten_proc
13.08736|ISTEP 8. 6 - host_p9_fbc_eff_config
13.09937|ISTEP 8. 7 - host_p9_eff_config_links
13.14822|ISTEP 8. 8 - proc_attr_update
13.14947|ISTEP 8. 9 - proc_chiplet_fabric_scominit
13.16541|ISTEP 8.10 - proc_xbus_scominit
13.17879|ISTEP 8.11 - proc_xbus_enable_ridi
13.19928|ISTEP 8.12 - host_set_voltages
13.21404|ISTEP 9. 1 - fabric_erepair
13.29682|ISTEP 9. 2 - fabric_io_dccal
13.29882|ISTEP 9. 3 - fabric_pre_trainadv
13.30459|ISTEP 9. 4 - fabric_io_run_training
13.30670|ISTEP 9. 5 - fabric_post_trainadv
13.30870|ISTEP 9. 6 - proc_smp_link_layer
13.31264|ISTEP 9. 7 - proc_fab_iovalid
13.46288|ISTEP 9. 8 - host_fbc_eff_config_aggregate
13.47825|ISTEP 10. 1 - proc_build_smp
13.58943|ISTEP 10. 2 - host_slave_sbe_update
14.62127|ISTEP 10. 4 - proc_cen_ref_clk_enable
14.64082|ISTEP 10. 5 - proc_enable_osclite
14.64181|ISTEP 10. 6 - proc_chiplet_scominit
14.68409|ISTEP 10. 7 - proc_abus_scominit
14.68546|ISTEP 10. 8 - proc_obus_scominit
14.68682|ISTEP 10. 9 - proc_npu_scominit
14.71088|ISTEP 10.10 - proc_pcie_scominit
14.75217|ISTEP 10.11 - proc_scomoverride_chiplets
14.75489|ISTEP 10.12 - proc_chiplet_enable_ridi
14.75749|ISTEP 10.13 - host_rng_bist
14.76043|ISTEP 10.14 - host_update_redundant_tpm
14.76317|ISTEP 11. 1 - host_prd_hwreconfig
14.79285|ISTEP 11. 2 - cen_tp_chiplet_init1
14.79725|ISTEP 11. 3 - cen_pll_initf
14.80072|ISTEP 11. 4 - cen_pll_setup
14.80374|ISTEP 11. 5 - cen_tp_chiplet_init2
14.80772|ISTEP 11. 6 - cen_tp_arrayinit
14.81076|ISTEP 11. 7 - cen_tp_chiplet_init3
14.82940|ISTEP 11. 8 - cen_chiplet_init
14.83243|ISTEP 11. 9 - cen_arrayinit
14.83544|ISTEP 11.10 - cen_initf
14.83924|ISTEP 11.11 - cen_do_manual_inits
14.84190|ISTEP 11.12 - cen_startclocks
14.84632|ISTEP 11.13 - cen_scominits
14.84950|ISTEP 12. 1 - mss_getecid
15.10883|ISTEP 12. 2 - dmi_attr_update
15.13412|ISTEP 12. 3 - proc_dmi_scominit
15.18279|ISTEP 12. 4 - cen_dmi_scominit
15.18575|ISTEP 12. 5 - dmi_erepair
15.24568|ISTEP 12. 6 - dmi_io_dccal
15.24777|ISTEP 12. 7 - dmi_pre_trainadv
15.25649|ISTEP 12. 8 - dmi_io_run_training
15.26825|ISTEP 12. 9 - dmi_post_trainadv
15.27092|ISTEP 12.10 - proc_cen_framelock
15.27372|ISTEP 12.11 - host_startprd_dmi
15.27641|ISTEP 12.12 - host_attnlisten_memb
15.27940|ISTEP 12.13 - cen_set_inband_addr
15.28241|ISTEP 13. 1 - host_disable_memvolt
15.44098|ISTEP 13. 2 - mem_pll_reset
15.48146|ISTEP 13. 3 - mem_pll_initf
15.49836|ISTEP 13. 4 - mem_pll_setup
15.52314|ISTEP 13. 6 - mem_startclocks
15.53972|ISTEP 13. 7 - host_enable_memvolt
15.55267|ISTEP 13. 8 - mss_scominit
15.87821|ISTEP 13. 9 - mss_ddr_phy_reset
15.92635|ISTEP 13.10 - mss_draminit
16.01169|ISTEP 13.11 - mss_draminit_training
16.38683|ISTEP 13.12 - mss_draminit_trainadv
16.40070|ISTEP 13.13 - mss_draminit_mc
16.43340|ISTEP 14. 1 - mss_memdiag
18.44189|ISTEP 14. 2 - mss_thermal_init
18.47948|ISTEP 14. 3 - proc_pcie_config
18.52297|ISTEP 14. 4 - mss_power_cleanup
18.52686|ISTEP 14. 5 - proc_setup_bars
18.55430|ISTEP 14. 6 - proc_htm_setup
18.56616|ISTEP 14. 7 - proc_exit_cache_contained
18.64554|ISTEP 15. 1 - host_build_stop_image
20.48206|================================================
20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
20.49786| No WOF table match found
20.49787| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
20.49787| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
20.49788| UserData1 Number of cores : 0x0004000200000064
20.49789| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
20.49790|------------------------------------------------
20.49790| Callout type : Procedure Callout
20.49791| Procedure : EPUB_PRC_HB_CODE
20.49792| Priority : SRCI_PRIORITY_HIGH
20.49793|------------------------------------------------
20.49793| Callout type : Hardware Callout
20.49795| Target : Physical:/Sys0/Node0/Proc0
20.49795| Deconfig State : NO_DECONFIG
20.49796| GARD Error Type : GARD_NULL
20.49797| Priority : SRCI_PRIORITY_MED
20.49797|------------------------------------------------
20.49798| Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
20.49799|================================================
20.71743|ISTEP 15. 2 - proc_set_pba_homer_bar
20.79928|ISTEP 15. 3 - host_establish_ex_chiplet
20.82571|ISTEP 15. 4 - host_start_stop_engine
20.87112|ISTEP 16. 1 - host_activate_master
22.05514|ISTEP 16. 2 - host_activate_slave_cores
22.16728|ISTEP 16. 3 - host_secure_rng
22.17455|ISTEP 16. 4 - mss_scrub
22.20929|ISTEP 16. 5 - host_load_io_ppe
22.23870|ISTEP 16. 6 - host_ipl_complete
22.61928|ISTEP 18.11 - proc_tod_setup
22.73876|ISTEP 18.12 - proc_tod_init
22.74294|ISTEP 20. 1 - host_load_payload
23.38540|ISTEP 20. 2 - host_load_hdat
24.73240|ISTEP 21. 1 - host_runtime_setup
26.63992|================================================
26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
26.63994| No WOF table match found
26.63995| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
26.63995| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
26.63996| UserData1 Number of cores : 0x0004000200000064
26.63997| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
26.63998|------------------------------------------------
26.63999| Callout type : Procedure Callout
26.63999| Procedure : EPUB_PRC_HB_CODE
26.64000| Priority : SRCI_PRIORITY_HIGH
26.64001|------------------------------------------------
26.64002| Callout type : Hardware Callout
26.64003| Target : Physical:/Sys0/Node0/Proc0
26.64004| Deconfig State : NO_DECONFIG
26.64004| GARD Error Type : GARD_NULL
26.64005| Priority : SRCI_PRIORITY_MED
26.64006|------------------------------------------------
26.64006| Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
26.64007|================================================
30.11356|htmgt|OCCs are now running in ACTIVE state
34.34112|ISTEP 21. 2 - host_verify_hdat
34.36810|ISTEP 21. 3 - host_start_payload
[ 35.181479495,5] OPAL skiboot-9858186 starting...
[ 35.181483122,7] initial console log level: memory 7, driver 5
[ 35.181485237,6] CPU: P9 generation processor (max 4 threads/core)
[ 35.181487084,7] CPU: Boot CPU PIR is 0x0054 PVR is 0x004e1201
[ 35.181490035,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[ 35.181493147,7] Assigning physical memory map table for nimbus
[ 35.181496208,7] Parsing HDAT...
[ 35.181497493,7] SPIRA-S found.
[ 35.181500165,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[ 35.181573498,6] SP Family is ibm,ast2500,openbmc
[ 35.181579994,7] LPC: IOPATH chip id = 0
[ 35.181581326,7] LPC: FW BAR = f0000000
[ 35.181582857,7] LPC: MEM BAR = e0000000
[ 35.181584303,7] LPC: IO BAR = d0010000
[ 35.181585742,7] LPC: Internal BAR = c0012000
[ 35.181598542,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[ 35.181601241,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[ 35.182842118,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[ 35.182938290,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[ 35.182981372,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[ 35.183024148,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[ 35.183065251,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[ 35.183106029,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[ 35.183162356,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[ 35.183211752,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[ 35.183934952,5] CHIP: Chip ID 0000 type: P9N DD2.10
[ 35.184291461,5] PLAT: Detected Talos platform
[ 35.184352236,5] PLAT: Detected BMC platform ast2500:openbmc
[ 35.201105130,5] CPU: All 16 processors called in...
[ 35.266716473,7] LPC: Routing irq 10, policy: 0 (r=1)
[ 35.266717473,7] LPC: SerIRQ 10 using route 0 targetted at OPAL
[ 36.374445970,5] HIOMAP: Negotiated hiomap protocol v2
[ 36.374512392,5] HIOMAP: Block size is 64KiB
[ 36.374559447,5] HIOMAP: BMC suggested flash timeout of 8s
[ 36.374589305,5] HIOMAP: Flash size is 64MiB
[ 36.374627095,5] HIOMAP: Erase granule size is 64KiB
[ 38.486837922,5] FLASH: Found system flash: (unnamed) id:0
[ 39.277537988,7] LPC: Routing irq 4, policy: 0 (r=1)
[ 39.277539066,7] LPC: SerIRQ 4 using route 1 targetted at OPAL
[ 39.277694733,5] OCC: All Chip Rdy after 0 ms
[ 40.074902821,3] STB: VERSION verification FAILED. log=0xffffffffffff8160
[ 41.192679074,3] STB: IMA_CATALOG verification FAILED. log=0xffffffffffff8160
[ 41.384604683,3] CAPP: Error loading ucode lid. index=201d1
[ 41.398817729,5] PCI: Resetting PHBs and training links...
[ 42.419398089,5] PCI: Probing slots...
[ 42.476358942,3] PCI: PHB (80000) not found
[ 42.476847088,3] PCI: PHB (80001) not found
[ 42.477738924,3] PCI: PHB (80003) not found
[ 42.478659084,5] PCI Summary:
[ 42.479475473,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot2 (16x)
[ 42.481349662,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot1 (8x)
[ 42.483357056,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=Builtin SAS
[ 42.484697264,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB
[ 42.486595728,5] PHB#0003:01:00.0 [EP ] 104c 8241 R:02 C:0c0330 ( usb-xhci) LOC_CODE=Builtin USB
[ 42.488494066,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet
[ 42.490306090,5] PHB#0004:01:00.0 [EP ] 14e4 1657 R:00 C:020000 ( ethernet) LOC_CODE=Builtin Ethernet
[ 42.492133493,5] PHB#0004:01:00.1 [EP ] 14e4 1657 R:00 C:020000 ( ethernet) LOC_CODE=Builtin Ethernet
[ 42.494059776,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC
[ 42.495807605,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
[ 42.498043037,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 ( vga) LOC_CODE=BMC
[ 42.504305115,5] IPMI: Resetting boot count on successful boot
[ 42.506128825,5] INIT: Waiting for kernel...
[ 54.323916712,3] STB: BOOTKERNEL verification FAILED. log=0xffffffffffff8160
[ 54.326205744,5] INIT: 64-bit LE kernel discovered
[ 54.483191117,5] INIT: Starting kernel at 0x20011000, fdt at 0x306f1ba8 204105 bytes
zImage starting: loaded at 0x0000000020011000 (sp: 0x000000002145bed8)
Allocating 0x23f4098 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x0000000020020000:0x0000000021459984)...
Done! Decompressed 0x220c5d8 bytes
@SergiiDmytruk Thanks.
@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170
@macpijan :
user@captive-portal:~$ scp /home/user/QubesIncoming/Insurgo/coreboot-binary/*.ecc root@talos:/tmp/
root@talos's password:
bootblock.signed.ecc 100% 28KB 1.3MB/s 00:00
coreboot.rom.signed.ecc 100% 2309KB 3.2MB/s 00:00
user@captive-portal:~$ ssh -l root talos
root@talos's password:
root@talos:~# obmcutil poweroff
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:~# systemctl stop mboxd
root@talos:~# systemctl start mboxd
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:~# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s
root@talos:~# pflash -P HBI -p /tmp/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# pflash -P HBB -p /tmp/bootblock.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:~# obmcutil poweron
root@talos:~# cat /var/log/obmc-console.log
SIGTERM received, booting...
Requesting system poweroff
[ 799.110662] reboot: Power down
[ 855.188539622,5] OPAL: Shutdown request type 0x0...
st type
coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12297 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms
coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
R_T[0] = 10
R_T[1] = 10
R_T[2] = 79
W_T[0] = 0
W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
R_T[0] = 12
R_T[1] = 12
R_T[2] = 95
W_T[0] = 0
W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
starting istep 13.2
ending istep 13.2
starting istep 13.3
putRing took 3 ms
ending istep 13.3
starting istep 13.4
ending istep 13.4
starting istep 13.6
ending istep 13.6
starting istep 13.8
Please FIXME: ATTR_MSS_RUNTIME_MEM_THROTTLED_N_COMMANDS_PER_SLOT
ending istep 13.8
starting istep 13.9
ending istep 13.9
starting istep 13.10
CCS took 2 us (3 us timeout), 1 instruction(s)
CCS took 2 us (2 us timeout), 14 instruction(s)
RCD dump for I2C address 0x5a:
0xf820fa30: 80 b3 40 42 30 00 00 00 02 01 00 03 cb e3 c0 0d ..@B0...........
0xf820fa40: 00 00 39 00 00 00 00 00 00 00 07 00 00 00 00 00 ..9.............
ending istep 13.10
starting istep 13.11
CCS took 2 us (7 us timeout), 2 instruction(s)
Write Leveling starting
CCS took 13 us (92 us timeout), 5 instruction(s)
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
Initial Pattern Write done
DQS alignment starting
CCS took 11 us (44 us timeout), 1 instruction(s)
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
Read Centering done
Write Centering starting
CCS took 8848 us (11314 us timeout), 7 instruction(s)
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12380 size 0xcbe5 in mcache @0xf82
We also have 8 GiB stick at SPD @ 0x52
on first CPU, but limiting memory to that doesn't trigger the issue.
There is also passing memory data from romstage to ramstage, but it looks quite reliable.
@krystian-hebel any ideas what can go wrong on moving to ramstage and how to debug it? Setting log level to "spew" doesn't seem to add anything to the output.
0xF000F = 221d104900008040
DD21. Hope we didn't mess up with attribute values for different DDs...
@tlaurion Does it work with two CPUs or you haven't tried it?
@SergiiDmytruk didn't try 2 cpus, no. For the moment, I am testing one CPU one ram module, basically Talos lite setup.
Shipped board was rejected for sale since some (unknown) ram slots were non-functional. I could buy more RAM to test second CPU later on if there is a way to test one CPU setup even if 2 CPUs are inserted.
I would not see myself adding and removing those often and would prefer testing one CPU setup here until you guys can software disable the second CPU and move from there.
Seems it hangs during loading of ramstage, because final romstage line (BS: romstage times ...
) isn't printed. Maybe something wrong with memory address translation set in 14.5?
First step of debugging would be checking in GUI if it checkstoped or "just" watchdog timed out. Then, for trying to pinpoint what exactly checkstops one has to read appropriate SCOMs in search of failure. Maybe it is worth trying to prepare a script for dumping all of those, although this kind of depends on installed hardware (e.g. which cores on CPU are enabled) and I don't know if reading all registers is safe.
@tlaurion can you try installing DIMM in other slots? This looks like memory issue, but it's hard to tell whether it is hardware or configuration issue at this point. On the other hand memory training seems to be successful...
@krystian-hebel Hostboot boots, as per https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1111090849 provided logs.
The test report that came with the board stated that hostboot test run failed on some other ram slots, but the one stated per user manual for cpu0 one ram module is working.
@krystian-hebel I have not changed jumpers either, if that is a needed thing (secure boot is on by default).
I thought it was not, since booting from mounted ROM. But if anything else needs to be done, please advise.
9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
suggests it is disabled.
@tlaurion could you log in the GUI after it hangs and check if https://<BMC IP>/#/server-health/event-log
shows anything interesting? Especially if there is a fresh "checkstop" reported.
@tlaurion could you log in the GUI after it hangs and check if
https://<BMC IP>/#/server-health/event-log
shows anything interesting? Especially if there is a fresh "checkstop" reported.
@krystian-hebel:
Four errors are reported from hostboot boot, yes. Export in json from that page:
{"/xyz/openbmc_project/logging/entry/72":{"AdditionalData":["_PID=1224","ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 2c d5 64 ff 00 00 00 03 2c fc 0c bc 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 56 90 00 02 56 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4c 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 25 d7 c9 c6 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d7 d8 47 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 25 d7 e6 c2 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 25 d7 f1 a5 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 25 d7 fc 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 25 d8 0a d1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 25 d8 15 6a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 25 d8 20 04 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d8 2e ac 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 26 21 19 37 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 26 21 23 51 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ac ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ae ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 36 18 03 00 00 00 00 00 00 ff","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":72,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082386994,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/73":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","PROCEDURE=85","_PID=1224"],"Id":73,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082387182,"Version":"2.7.0-dev-571-g67efd9872","associations":[]},"/xyz/openbmc_project/logging/entry/74":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","_PID=1224","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":74,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082391898,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/75":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 32 72 b6 03 00 00 00 03 32 99 d4 34 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 90 90 00 02 90 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4e 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 30 cf 0e 0a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 1c 67 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 30 cf 2a 90 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 30 cf 34 fe 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 30 cf 3f b1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 30 cf 4e 4d 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 30 cf 59 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 30 cf 64 1f 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 72 e0 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 31 19 48 32 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 31 19 52 bf 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 80 00 03 63 07 01 10 3f 80 00 03 64 07 01 10 3f 80 00 03 65 07 01 10 3f 80 00 03 66 07 01 10 3f 80 00 03 67 07 01 10 3f 80 00 03 68 07 01 10 3f 80 00 03 69 07 01 10 3f 80 00 03 6a 07 01 10 3f 80 00 03 6b 07 01 10 3f 80 00 07 60 07 01 10 3f 80 00 07 61 07 01 10 3f 80 00 07 62 07 01 10 3f 80 00 07 63 07 01 10 3f 80 00 07 64 07 01 10 3f 80 00 07 65 07 01 10 3f 80 00 07 66 07 01 10 3f 80 00 07 67 07 01 10 3f 80 00 07 68 07 01 10 3f 80 00 07 69 07 01 10 3f 80 00 07 6a 07 01 10 3f 80 00 07 6b 07 01 10 3f 80 00 0b 60 07 01 10 3f 80 00 0b 61 07 01 10 3f 80 00 0b 62 07 01 10 3f 80 00 0b 63 07 01 10 3f 80 00 0b 64 07 01 10 3f 80 00 0b 65 07 01 10 3f 80 00 0b 66 07 01 10 3f 80 00 0b 67 07 01 10 3f 80 00 0b 68 07 01 10 3f 80 00 0b 69 07 01 10 3f","PROCEDURE=85","_PID=1224"],"Id":75,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082392034,"Version":"2.7.0-dev-571-g67efd9872","associations":[]}}
Hostboot boot reported (snippets) from above logs:
20.48206|================================================
20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
20.49786| No WOF table match found
20.49787| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
20.49787| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
20.49788| UserData1 Number of cores : 0x0004000200000064
20.49789| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
20.49790|------------------------------------------------
20.49790| Callout type : Procedure Callout
20.49791| Procedure : EPUB_PRC_HB_CODE
20.49792| Priority : SRCI_PRIORITY_HIGH
20.49793|------------------------------------------------
20.49793| Callout type : Hardware Callout
20.49795| Target : Physical:/Sys0/Node0/Proc0
20.49795| Deconfig State : NO_DECONFIG
20.49796| GARD Error Type : GARD_NULL
20.49797| Priority : SRCI_PRIORITY_MED
and
26.63992|================================================
26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
26.63994| No WOF table match found
26.63995| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
26.63995| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
26.63996| UserData1 Number of cores : 0x0004000200000064
26.63997| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
26.63998|------------------------------------------------
26.63999| Callout type : Procedure Callout
26.63999| Procedure : EPUB_PRC_HB_CODE
26.64000| Priority : SRCI_PRIORITY_HIGH
26.64001|------------------------------------------------
26.64002| Callout type : Hardware Callout
26.64003| Target : Physical:/Sys0/Node0/Proc0
26.64004| Deconfig State : NO_DECONFIG
26.64004| GARD Error Type : GARD_NULL
26.64005| Priority : SRCI_PRIORITY_MED
26.64006|------------------------------------------------
26.64006| Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
suggests it is disabled.Right.
Secure Mode
came disabled for both CPUs, jumper being set at 2nd and 3rd PINs for both CPUs.
@tlaurion those events (unfortunately?) aren't checkstops, which may make debugging harder. Most likely those are just logs from those errors reported by Hostboot. Feel free to clear that log so it won't keep showing them. Still, I'd like to ask you to dump some SCOM registers that would help if it were a checkstop, because some errors may still be masked at that time:
/tmp/fir_scoms.txt
on BMCwhile read scom; do
if [[ "$scom" == "0x"* ]]; then
pdbg -P pib0 getscom $scom
else
echo "$scom"
fi
done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
/tmp/scom_dump.log
here.It is important to get the full dump from the first run - in case of problem in one place it may report further errors when trying to read additional registers, and that can obscure the real issue. Ideally those registers should be accessed in proper order, depending on results of earlier reads, but asking for them one at a time would take too much.
@krystian-hebel
The problem here, it testing :
@SergiiDmytruk Thanks.
@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170
Is that the machine bootloops, so that when I run the script:
root@talos:~# while read scom; do
> if [[ "$scom" == "0x"* ]]; then
> pdbg -P pib0 getscom $scom
> else
> echo "$scom"
> fi
> done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown
Here is still the output of it, prior of it giving pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown
:
scom_dump.log
IIRC it would stop bootlooping after 3 or 4 tries (there is a way to reduce that number, but Hostboot doesn't like it), but I think all important registers were dumped. I'll take a look an will let you know when I find anything.
For some reason, next boot didn't hit a bootloop.
Running:
while read scom; do
if [[ "$scom" == "0x"* ]]; then
pdbg -P pib0 getscom $scom
else
echo "$scom"
fi
done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
obmc-console.log scom_dump.log
Here is scp'ed logs
Maybe we should consider some scripts or guides which information to dump and how t do it to help us debugging issues.
@macpijan would it be useful to have scom dump from successful hostboot booting petitboot?
Those shouldn't matter, I kind of know what to expect there. What would help would be fuller dump when it hangs and a checkstop happens (i.e. when second register on the list, 0x500f001c
, is non-zero). If you could remove from input file all cache chiplets except chiplet 4 (0x14xxxxxx
left, other 0x1xxxxxxx
removed), and all core chiplets except chiplet 18 (0x32xxxxxx
left, other 0x2xxxxxxx
and 0x3xxxxxxx
removed) and try again, it should take less time and hopefully dump all/most of the registers before reboot happens.
I don't know how https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1112220743 didn't get to the checkstop, instead it just stopped in a random place.
@krystian-hebel: not sure what a checkstop is. (0x500f001c is zero here)
The logs collected here were collected from machine not stuck in a bootloop. The machine is still on now, not having rebooted automatically.
So I understand that to test and report the proper logs, it is required to boot successfully from hostboot first (to reset state and produce bootloops, for which a checkstop is what is triggered when a bootloop occurs and where system reboots automatically?)
fir_scoms_issue80.txt obmc-console.log scom_dump_issue80.log
@krystian-hebel : when in bootloop
root@talos:~# while read scom; do if [[ "$scom" == "0x"* ]]; then pdbg -P pib0 getscom $scom; else echo "$scom"; fi; done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump_issue80.log
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
But 0x500f001c is non-zero.
Attached is the log I was able to get in such case. scom_dump_issue80_bootloop.log
Ok, I was expecting it to be stuck in checkstop state after those 5 boot attempts, but apparently it isn't. Let's try different approach: when it is about to reboot, execute pdbg -a stop
(this stops CPU execution, giving more time for dumping), then check if global FIR is non-zero with pdbg -P pib0 getscom 0x500f001c
(this one is safe to read, shouldn't impact other registers), and only if it is non-zero, get the full dump.
If that register is still zero, try pdbg -a start
followed by pdbg -a stop
to give it some more time to work.
user@talos-tests:~$ ssh -l root talos
root@talos's password:
root@talos:~# pdbg -a stop
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
(reverse-i-search)`while': while read scom; do if [[ "$scom" == "0x"* ]]; then pdbg -P pib0 getscom $scom; else echo "$scom"; fi; done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump_issue80.log
And it reboots prior of me being able to get anything more.
This one completed successfully, but machine is still up.
@krystian-hebel this is the registers polled (provide any reduced version needed) https://github.com/Dasharo/dasharo-issues/files/8603193/fir_scoms_issue80.txt
Even if calling pdbg -a stop
, bootloop still occurs as shown in precedent output https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1114958057
Yeah, the window for stopping CPU is easy to miss, and watchdog probably still works. This should be enough for me to dig into, I'll let you know what those numbers tell me.
So to clarify. "Checkstop" is happening at the end of the bootloop with machine staying up without further reboot?
If it is the case, debugging instructions should state to let machine bootloop 5 times prior of collecting the register logs.
Otherwise, please let me know for further test runs.
Checkstop is whenever 0x500f001c
is non-zero. Whether platform reboots or stops depends on the configuration, I thought that it would do 4 reboots and get stuck on 5th. It should be reported in those logs in GUI that I asked earlier, but from those logs it seems that something else happens instead.
Findings so far:
reserved
in documentationI looked once again at console log and noticed something strange:
MCS0 MCA1 DIMM0 has 0 bad nibble(s) and 0 bad bit(s), but can be recovered
This means that RAM is bad enough to error out during calibration, but for some reason (probably bug in our code) it can't read which lines failed. Summary registers says this is more than 1, up to 4 bad bits which should be recoverable in most cases (although it uses ECC to recover data, so further errors may be unrecoverable), Hostboot probably does it.
@tlaurion please try those and sent just the console dump: https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT
@krystian-hebel
xf8231080
he @
coreboot-4.14-526-gad4b5b2f0e Mon May 2 16:20:29 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x126bb in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms
coreboot-4.14-526-gad4b5b2f0e Mon May 2 16:20:29 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
R_T[0] = 10
R_T[1] = 10
R_T[2] = 79
W_T[0] = 0
W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
R_T[0] = 12
R_T[1] = 12
R_T[2] = 95
W_T[0] = 0
W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
starting istep 13.2
ending istep 13.2
starting istep 13.3
putRing took 3 ms
ending istep 13.3
starting istep 13.4
ending istep 13.4
starting istep 13.6
ending istep 13.6
starting istep 13.8
Please FIXME: ATTR_MSS_RUNTIME_MEM_THROTTLED_N_COMMANDS_PER_SLOT
ending istep 13.8
starting istep 13.9
ending istep 13.9
starting istep 13.10
CCS took 2 us (3 us timeout), 1 instruction(s)
CCS took 2 us (2 us timeout), 14 instruction(s)
RCD dump for I2C address 0x5a:
0xf820fa30: 80 b3 40 42 30 00 00 00 02 01 00 03 cb e3 c0 0d ..@B0...........
0xf820fa40: 00 00 39 00 00 00 00 00 00 00 07 00 00 00 00 00 ..9.............
ending istep 13.10
starting istep 13.11
CCS took 2 us (7 us timeout), 2 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000000000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling starting
CCS took 13 us (92 us timeout), 5 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Initial Pattern Write done
DQS alignment starting
CCS took 10 us (44 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Centering done
Write Centering starting
CCS took 8852 us (11314 us timeout), 7 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003e1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a00 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003e1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a00 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x127c0 size 0xcbe5 in mcach
Checkstop is whenever
0x500f001c
is non-zero. Whether platform reboots or stops depends on the configuration, I thought that it would do 4 reboots and get stuck on 5th. It should be reported in those logs in GUI that I asked earlier, but from those logs it seems that something else happens instead.
Thanks for the clarification.
There are org.open_power.Host.Boot.Error.Checkstop when booting coreboot, but they only contain PIDs and seemed irrelevant. Content of such logs from last boot attempt were:
_PID=6707
and _PID=6404
@krystian-hebel anything else you would need ?
@tlaurion I updated binaries at https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT, please check with those. I expect it to die on https://github.com/Dasharo/coreboot/blob/raptor-cs_talos-2/ram_debug_fixes/src/soc/ibm/power9/istep_13_11.c#L896. If that is the case, I would like to see both console dump and SCOM dump, if it doesn't hang there just the console output (may be just from start of istep 13.11 onward) will be enough.
As the comment above line from link says, there is a lot of code in the workaround missing. I haven't implemented it because all of our DIMMs were nice enough to not require it, so I had no way of testing whether it works. Seems that this may have just changed :slightly_smiling_face:
@krystian-hebel
root@talos:~# cd /tmp/v0.5.0_debug_ram/
root@talos:/tmp/v0.5.0_debug_ram# sha256sum *
3f82006b01cbbdb5ca4b1bb7c56088cad7e5fa7cdbcc9275e1e36905334c882e bootblock.signed.ecc
7c8f7da210b5facb89724347f4096d7bff8024e2d9e046a132f52f0286f3f86c coreboot.rom.signed.ecc
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend vpnor
Failed to post message: Connection timed out
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:/tmp/v0.5.0_debug_ram# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBB -p /tmp/v0.5.0_debug_ram/bootblock.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/v0.5.0_debug_ram/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBI -p /tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
Will be reusing: Reusing https://github.com/Dasharo/dasharo-issues/files/8603193/fir_scoms_issue80.txt and
while read scom; do
if [[ "$scom" == "0x"* ]]; then
pdbg -P pib0 getscom $scom
else
echo "$scom"
fi
done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log
Ouput:
starting root@talos:/tmp/v0.5.0_debug_ram# cat /var/log/obmc-console.log
0000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Initial Pattern Write done
DQS alignment starting
CCS took 10 us (44 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000000000 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Centering done
Write Centering starting
CCS took 8449 us (11314 us timeout), 7 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a0e - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003e1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000001a1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000001a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a00 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
DP 0
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a0e - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 1
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003e1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 2
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000001a1a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 3
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000001a3a - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
DP 4
0x0000000000000000 - RD_VREF_CAL_ERROR
0x0000000000000000 - DQ_BIT_DISABLE_RP0
0x0000000000000000 - DQS_BIT_DISABLE_RP0
0x0000000000000000 - WR_ERROR0
0x0000000000000000 - RD_STATUS0
0x0000000000000000 - RD_LVL_STATUS2
0x0000000000000000 - RD_LVL_STATUS0
0x0000000000003a00 - WR_VREF_ERROR0
0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12840 size 0xcbe7 in mcache @root@talos:/tmp/v0.5.0_debug_ram#
No checkstop?
root@talos:/tmp/v0.5.0_debug_ram# pdbg -P pib0 getscom 0x500f001c
p0: 0x00000000500f001c = 0x0000000000000000 (/kernelfsi@0/pib@1000)
root@talos:/tmp/v0.5.0_debug_ram#
root@talos:/tmp/v0.5.0_debug_ram# while read scom; do
> if [[ "$scom" == "0x"* ]]; then
> pdbg -P pib0 getscom $scom
> else
> echo "$scom"
> fi
> done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log
root@talos:/tmp/v0.5.0_debug_ram#
Attached are the scom logs: scom_dump.log
@tlaurion to exclude possibility of bad support for CPU, could you test if it works (or at least behaves differently) when this DIMM is installed in other slots, even though instruction tells to use this one?
@macpijan : user's manual suggests memory bank population for one DIMM to be in B1, on one/two CPU configuration.
You suggest that I put my single M393A1K43BB0-CRC in what other slot?
EDIT: outside channel answer of @krystian-hebel "Any one should do." Putting in D1.
hmmm. Memory stick in D1:
16.90082|================================================
16.91730|Error reported by prdf (0xE500) EID 0x90000366
16.91731| PRD Signature : 0x240005 0xFFFF0044
16.91874| Signature Description : pu.mca:k0:n0:s0:p00:c5 () RDR: Repairs needed but unavailable
16.91875| UserData1 : 0x0024000500000000
16.91876| UserData2 : 0xffff004400000000
16.91876|------------------------------------------------
16.91877| Callout type : Hardware Callout
16.91878| Target : Physical:/Sys0/Node0/DIMM6
16.91879| Deconfig State : DELAYED_DECONFIG
16.91880| GARD Error Type : GARD_Predictive
16.91880| Priority : SRCI_PRIORITY_HIGH
16.91881|------------------------------------------------
16.91882| Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
16.91882|================================================
17.26972|System Shutting DownTo Perform Reconfiguration After Deconfig
17.26973|IPMI: Initiate power cycle
17.27087|Stopping istep dispatcher
17.99366|IPMI: shutdown complete
15.12352|ISTEP 13. 7 - host_enable_memvolt
15.13551|ISTEP 13. 8 - mss_scominit
15.47751|ISTEP 13. 9 - mss_ddr_phy_reset
15.53023|ISTEP 13.10 - mss_draminit
15.61190|ISTEP 13.11 - mss_draminit_training
16.03449|System Shutting DownTo Perform Reconfiguration After Recoverable Error
16.03451|IPMI: Initiate power cycle
16.05795|Stopping istep dispatcher
16.41643|IPMI: shutdown complete
Memory stick now in A1. Debugged with @krystian-hebel.
It seems that the issue above was linked to components being faulty and the Talos deactivating those components to continue on other working ones as described here.
To make A1 work, it was necessary to wipe from instructions:
pflash -P GUARD -c
Note: On my board, D1 slot is faulty (let's remember the donation came from a board that failed DUT_HW_GUARD_FAILURE), with the device Test report specifying that "2x ram slots fail".
What we know as of now if that A1 and B1 (blue slots, under CPU0) works, and hostboot is able to train (as coreboot) the M393A1K43BB0-CRC stick.
It seems that the CPU revision in my setup is not known from DEFCONFIG (from Hostboot):
34.38209|ISTEP 21. 1 - host_runtime_setup
36.26054|================================================
36.26055|Error reported by fapi2 (0x3300) EID 0x900003C7
36.26056| No WOF table match found
36.26056| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
36.26057| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
36.26058| UserData1 Number of cores : 0x0004000200000064
36.26059| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
36.26060|------------------------------------------------
36.26061| Callout type : Procedure Callout
36.26062| Procedure : EPUB_PRC_HB_CODE
36.26062| Priority : SRCI_PRIORITY_HIGH
36.26063|------------------------------------------------
36.26064| Callout type : Hardware Callout
36.26066| Target : Physical:/Sys0/Node0/Proc0
36.26066| Deconfig State : NO_DECONFIG
36.26067| GARD Error Type : GARD_NULL
36.26068| Priority : SRCI_PRIORITY_MED
36.26069|------------------------------------------------
@krystian-hebel @macpijan some updates?
Interesting find. Soon after entering ramstage processor hits an unknown instruction and as a result exception happens. Normally, at that point there should be no interrupt handlers installed, but for some reason around that particular exception (@ 0x840) I see something that looks like proper PPC64 code:
root@talos:~# pdbg -p 0 -c 18 -t 0 getmem 0x0 0x1000 | hexdump -C
[==================================================] 100%
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000e20 38 61 01 70 4b ff fb 8d 2c 23 00 00 41 82 ff 6c |8a.pK...,#..A..l|
00000e30 e8 bf 00 18 e9 3f 00 10 38 80 00 00 7c a3 28 50 |.....?..8...|.(P|
00000e40 7c 69 1a 14 48 00 25 cd 60 00 00 00 e8 9f 00 18 ||i..H.%.`.......|
00000e50 e8 7f 00 10 38 a0 00 01 48 00 26 c5 60 00 00 00 |....8...H.&.`...|
00000e60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
From there it jumps into unused memory (all 0s), hits another exception and goes back into this handler.
There are some open questions that I'll try to debug more:
On memory training front: there are no errors when only 1D write centering is performed, instead of 2D centering. This is one of the workarounds Hostboot does when full algorithm fails, but according to my notes all registers that Hostboot tests for failure are reporting a success, even for 2D training. I will have to dig into Hostboot's code again and compare it with my notes.
Nevertheless, even with forced 1D write centering, the problem with "random" data in interrupt handlers' area still happens. Other parts of RAM are properly zeroed so it is most likely working as it should. It may be a different issue altogether and this is what I'm assuming right now. At least at that point we have RAM and can install our own interrupt handlers to help with debugging, assuming they won't get overwritten.
@SergiiDmytruk constructed a list of differences between different DD revisions of CPU, but it seems that Hostboot doesn't always use everything it defines. There may be more code depending on DD versions that doesn't use attributes listed there. Some things are set before FAPI (used to access attributes, among other things) is enabled, like HRMOR or XIVE.
More findings:
0x820
-0x85f
is a copy of what was at 0xf8243aa0
-0xf8243adf
, but after that hang 0xf8243ac0
++ is zeroed. Copy in lower memory isn't truncated.
root@talos:~# pdbg -p 0 -c 18 -t 0 getmem 0xf8243a90 0x50 | hexdump -C
[==================================================] 100%
00000000 60 00 00 00 e8 bf 00 18 e8 9f 00 10 7f 86 e3 78 |`..............x|
00000010 38 61 01 70 4b ff fb 8d 2c 23 00 00 41 82 ff 6c |8a.pK...,#..A..l|
00000020 e8 bf 00 18 e9 3f 00 10 38 80 00 00 7c a3 28 50 |.....?..8...|.(P|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050
memset(ramstage.bss, 0, sizeof(ramstage.bss + ramstage.heap))
called from cbfs_prog_stage_load
. Return should jump to 0xf8243ac8
which is zeroed at that point, hence the exception.memset
finishes successfully after clearing full range. Its code definitely fits in L1I cache, probably even in IFB (instruction fetch buffer, can hold 24 instructions in SMT4 mode), so even if it were overwritten it should still be able to execute.memset
is at 0xf8243ac4
so that cache line should have been accessed and LRU should not choose it to be victimized. Memory is zeroed there on 64B boundary (half of cache line, fetching it to L1D is supported and AFAICT enabled), contrary to what happened in 0x820
-0x85f
range.dcbz
instruction). This doesn't happen here, memset
is able to clear additional ~2.6MB which together with what is initialized by bootblock is more than 10MB of available L3 cache (for the only core executing at that time). It seems as if L3 thought it could write back to RAM, but couldn't read it later.pdbg
asks main CPU to read memory, so it still goes through all cache levels. It can be told to use cache-inhibited read, but real access type depends on type of first access, it is stored in SLB or one of other lookaside buffers.I just hope we don't chase broken hardware here. Please note we may have some more remote setups of Talos II for coparison of behavior soon. I'm working on that.
@pietrushnic Hostboot works with this hardware, so even if it is not perfect, it should at least hit one of TODO workaround paths in coreboot. For some reason this does not happen, no error is detected until it breaks fatally.
@krystian-hebel understood, that means we need improvements on our side and we don't what the heck is going. I hope you can invent some diagnostics steps. Also maybe we should redirect this to mailing list, maybe someone there would have ideas how to debug that further?
@tlaurion I have another request: I need logs from talos.pnor
from https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT. This is Hostboot with customized (lots of irrelevant stuff suppressed, 80% faster and produces smaller log) debug output. On our machine this takes ~45 minutes to boot and produces ~30MB of log, this may be slightly less with one CPU, one DIMM. It should fit in ramfs, even with PNOR image also stored there. Read it with obmc-console-client > /tmp/boot.log
because /var/log/obmc-console.log
has limited size.
Full logs provided off channel.
@krystian-hebel any updates?
Dasharo version 0.5.0 release from https://docs.dasharo.com/variants/talos_2/releases/
Dasharo variant Workstation
Affected component(s) or functionality Memory initialization fails ( 1x M393A1K43BB0-CRC in B1 memory slot)
Brief summary The coreboot output stops at
How reproducible Always (Single 16 cores CPU, one RAM module: 8GB (More info on changes needed to be documented on non-flashing instructions under https://github.com/Dasharo/dasharo-issues/issues/79)
How to reproduce
Steps to reproduce the behavior: On laptop:
On a seperated SSH connection to BMC:
On another seperate SSH connection to BMC:
Expected behavior Ram init succeeds and net steps are engaged
Actual behavior Stops at