Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
24 stars 0 forks source link

Talos II - 0.5.0 - One CPU : Doesn't boot #80

Closed tlaurion closed 2 years ago

tlaurion commented 2 years ago

Dasharo version 0.5.0 release from https://docs.dasharo.com/variants/talos_2/releases/

Dasharo variant Workstation

Affected component(s) or functionality Memory initialization fails ( 1x M393A1K43BB0-CRC in B1 memory slot)

Brief summary The coreboot output stops at

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12421 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12421 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)

How reproducible Always (Single 16 cores CPU, one RAM module: 8GB (More info on changes needed to be documented on non-flashing instructions under https://github.com/Dasharo/dasharo-issues/issues/79)

How to reproduce

Steps to reproduce the behavior: On laptop:

user@captive-portal:~$ sha256sum -c raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc.sha256
raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc: OK
user@captive-portal:~$ sha256sum -c raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc.sha256
raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc: OK
user@captive-portal:~$ scp *v0.5.0*.ecc root@talos:/tmp/
raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc 100%   28KB   1.3MB/s   00:00    
raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc 100% 2309KB   3.1MB/s   00:00    

On a seperated SSH connection to BMC:

pflash -r /tmp/talos.pnor
pflash -P HBB -p /tmp/raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc -F /tmp/talos.pnor
pflash -P HBI -p /tmp/raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc -F /tmp/talos.pnor
systemctl stop mboxd
mboxd -f 64M -w 1M -b file:/tmp/talos.pnor -v

On another seperate SSH connection to BMC:

mboxctl --lpc-state
    “LPC Bus Maps: BMC Memory”
obmcutil poweron

Expected behavior Ram init succeeds and net steps are engaged

Actual behavior Stops at

SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
SergiiDmytruk commented 2 years ago

The bug is here. if (chips & (1 << chip)) is missing at least here.

macpijan commented 2 years ago

@tlaurion As said on the call, we do not have single CPU setup right now, and it would be troublesome to switch back and forth.

Thanks for the report, it is very valuable. Maybe we can try with one or two iterations of fixes which you can test on your side. If that does not help, we might need to rearrange the lab or think of a more efficient way (maybe testing remotely on your hardware?)

SergiiDmytruk commented 2 years ago

it would be troublesome to switch back and forth.

I made function that returns mask of chips to return 0x01 to find the errors. The other issue is with OCC startup. I think commit with fix for OCC just got lost.

Ignoring second CPU isn't the same as installing only one, but it's a good approximation. We might want to make a build option for that or do something else that allows limiting number of CPUs.

macpijan commented 2 years ago

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

tlaurion commented 2 years ago

For reference, my hostboot logs on same machine

--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--

  3.07002|secure|SecureROM valid - enabling functionality
  5.51144|Booting from SBE side 0 on master proc=00050000
  5.54800|ISTEP  6. 5 - host_init_fsi
  5.72083|ISTEP  6. 6 - host_set_ipl_parms
  6.02733|ISTEP  6. 7 - host_discover_targets
  8.98001|HWAS|PRESENT> DIMM[03]=2000000000000000
  8.98002|HWAS|PRESENT> Proc[05]=8000000000000000
  8.98003|HWAS|PRESENT> Core[07]=00003C0000000000
  9.06120|ISTEP  6. 8 - host_update_master_tpm
  9.55213|SECURE|Security Access Bit> 0x0000000000000000
  9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  9.55228|ISTEP  6. 9 - host_gard
  9.87348|HWAS|Deconfig HUID 0x0003000A, Physical:/Sys0/Node0/DIMM6
  9.87352|HWAS|Deconfig HUID 0x0003000B, Physical:/Sys0/Node0/DIMM7
  9.87467|HWAS|FUNCTIONAL> DIMM[03]=2000000000000000
  9.87468|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  9.87469|HWAS|FUNCTIONAL> Core[07]=00003C0000000000
  9.87780|ISTEP  6.11 - host_start_occ_xstop_handler
 10.84372|ISTEP  6.12 - host_voltage_config
 10.95588|ISTEP  7. 1 - mss_attr_cleanup
 11.60999|ISTEP  7. 2 - mss_volt
 11.82222|ISTEP  7. 3 - mss_freq
 12.14141|ISTEP  7. 4 - mss_eff_config
 12.82049|ISTEP  7. 5 - mss_attr_update
 12.83176|ISTEP  8. 1 - host_slave_sbe_config
 13.01644|ISTEP  8. 2 - host_setup_sbe
 13.02951|ISTEP  8. 3 - host_cbs_start
 13.03087|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 13.05204|ISTEP  8. 5 - host_attnlisten_proc
 13.08736|ISTEP  8. 6 - host_p9_fbc_eff_config
 13.09937|ISTEP  8. 7 - host_p9_eff_config_links
 13.14822|ISTEP  8. 8 - proc_attr_update
 13.14947|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 13.16541|ISTEP  8.10 - proc_xbus_scominit
 13.17879|ISTEP  8.11 - proc_xbus_enable_ridi
 13.19928|ISTEP  8.12 - host_set_voltages
 13.21404|ISTEP  9. 1 - fabric_erepair
 13.29682|ISTEP  9. 2 - fabric_io_dccal
 13.29882|ISTEP  9. 3 - fabric_pre_trainadv
 13.30459|ISTEP  9. 4 - fabric_io_run_training
 13.30670|ISTEP  9. 5 - fabric_post_trainadv
 13.30870|ISTEP  9. 6 - proc_smp_link_layer
 13.31264|ISTEP  9. 7 - proc_fab_iovalid
 13.46288|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 13.47825|ISTEP 10. 1 - proc_build_smp
 13.58943|ISTEP 10. 2 - host_slave_sbe_update
 14.62127|ISTEP 10. 4 - proc_cen_ref_clk_enable
 14.64082|ISTEP 10. 5 - proc_enable_osclite
 14.64181|ISTEP 10. 6 - proc_chiplet_scominit
 14.68409|ISTEP 10. 7 - proc_abus_scominit
 14.68546|ISTEP 10. 8 - proc_obus_scominit
 14.68682|ISTEP 10. 9 - proc_npu_scominit
 14.71088|ISTEP 10.10 - proc_pcie_scominit
 14.75217|ISTEP 10.11 - proc_scomoverride_chiplets
 14.75489|ISTEP 10.12 - proc_chiplet_enable_ridi
 14.75749|ISTEP 10.13 - host_rng_bist
 14.76043|ISTEP 10.14 - host_update_redundant_tpm
 14.76317|ISTEP 11. 1 - host_prd_hwreconfig
 14.79285|ISTEP 11. 2 - cen_tp_chiplet_init1
 14.79725|ISTEP 11. 3 - cen_pll_initf
 14.80072|ISTEP 11. 4 - cen_pll_setup
 14.80374|ISTEP 11. 5 - cen_tp_chiplet_init2
 14.80772|ISTEP 11. 6 - cen_tp_arrayinit
 14.81076|ISTEP 11. 7 - cen_tp_chiplet_init3
 14.82940|ISTEP 11. 8 - cen_chiplet_init
 14.83243|ISTEP 11. 9 - cen_arrayinit
 14.83544|ISTEP 11.10 - cen_initf
 14.83924|ISTEP 11.11 - cen_do_manual_inits
 14.84190|ISTEP 11.12 - cen_startclocks
 14.84632|ISTEP 11.13 - cen_scominits
 14.84950|ISTEP 12. 1 - mss_getecid
 15.10883|ISTEP 12. 2 - dmi_attr_update
 15.13412|ISTEP 12. 3 - proc_dmi_scominit
 15.18279|ISTEP 12. 4 - cen_dmi_scominit
 15.18575|ISTEP 12. 5 - dmi_erepair
 15.24568|ISTEP 12. 6 - dmi_io_dccal
 15.24777|ISTEP 12. 7 - dmi_pre_trainadv
 15.25649|ISTEP 12. 8 - dmi_io_run_training
 15.26825|ISTEP 12. 9 - dmi_post_trainadv
 15.27092|ISTEP 12.10 - proc_cen_framelock
 15.27372|ISTEP 12.11 - host_startprd_dmi
 15.27641|ISTEP 12.12 - host_attnlisten_memb
 15.27940|ISTEP 12.13 - cen_set_inband_addr
 15.28241|ISTEP 13. 1 - host_disable_memvolt
 15.44098|ISTEP 13. 2 - mem_pll_reset
 15.48146|ISTEP 13. 3 - mem_pll_initf
 15.49836|ISTEP 13. 4 - mem_pll_setup
 15.52314|ISTEP 13. 6 - mem_startclocks
 15.53972|ISTEP 13. 7 - host_enable_memvolt
 15.55267|ISTEP 13. 8 - mss_scominit
 15.87821|ISTEP 13. 9 - mss_ddr_phy_reset
 15.92635|ISTEP 13.10 - mss_draminit
 16.01169|ISTEP 13.11 - mss_draminit_training
 16.38683|ISTEP 13.12 - mss_draminit_trainadv
 16.40070|ISTEP 13.13 - mss_draminit_mc
 16.43340|ISTEP 14. 1 - mss_memdiag
 18.44189|ISTEP 14. 2 - mss_thermal_init
 18.47948|ISTEP 14. 3 - proc_pcie_config
 18.52297|ISTEP 14. 4 - mss_power_cleanup
 18.52686|ISTEP 14. 5 - proc_setup_bars
 18.55430|ISTEP 14. 6 - proc_htm_setup
 18.56616|ISTEP 14. 7 - proc_exit_cache_contained
 18.64554|ISTEP 15. 1 - host_build_stop_image
 20.48206|================================================
 20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
 20.49786|  No WOF table match found
 20.49787|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 20.49787|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 20.49788|  UserData1  Number of cores : 0x0004000200000064
 20.49789|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 20.49790|------------------------------------------------
 20.49790|  Callout type             : Procedure Callout
 20.49791|  Procedure                : EPUB_PRC_HB_CODE
 20.49792|  Priority                 : SRCI_PRIORITY_HIGH
 20.49793|------------------------------------------------
 20.49793|  Callout type             : Hardware Callout
 20.49795|  Target                   : Physical:/Sys0/Node0/Proc0
 20.49795|  Deconfig State           : NO_DECONFIG
 20.49796|  GARD Error Type          : GARD_NULL
 20.49797|  Priority                 : SRCI_PRIORITY_MED
 20.49797|------------------------------------------------
 20.49798|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 20.49799|================================================
 20.71743|ISTEP 15. 2 - proc_set_pba_homer_bar
 20.79928|ISTEP 15. 3 - host_establish_ex_chiplet
 20.82571|ISTEP 15. 4 - host_start_stop_engine
 20.87112|ISTEP 16. 1 - host_activate_master
 22.05514|ISTEP 16. 2 - host_activate_slave_cores
 22.16728|ISTEP 16. 3 - host_secure_rng
 22.17455|ISTEP 16. 4 - mss_scrub
 22.20929|ISTEP 16. 5 - host_load_io_ppe
 22.23870|ISTEP 16. 6 - host_ipl_complete
 22.61928|ISTEP 18.11 - proc_tod_setup
 22.73876|ISTEP 18.12 - proc_tod_init
 22.74294|ISTEP 20. 1 - host_load_payload
 23.38540|ISTEP 20. 2 - host_load_hdat
 24.73240|ISTEP 21. 1 - host_runtime_setup
 26.63992|================================================
 26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
 26.63994|  No WOF table match found
 26.63995|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 26.63995|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 26.63996|  UserData1  Number of cores : 0x0004000200000064
 26.63997|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 26.63998|------------------------------------------------
 26.63999|  Callout type             : Procedure Callout
 26.63999|  Procedure                : EPUB_PRC_HB_CODE
 26.64000|  Priority                 : SRCI_PRIORITY_HIGH
 26.64001|------------------------------------------------
 26.64002|  Callout type             : Hardware Callout
 26.64003|  Target                   : Physical:/Sys0/Node0/Proc0
 26.64004|  Deconfig State           : NO_DECONFIG
 26.64004|  GARD Error Type          : GARD_NULL
 26.64005|  Priority                 : SRCI_PRIORITY_MED
 26.64006|------------------------------------------------
 26.64006|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 26.64007|================================================
 30.11356|htmgt|OCCs are now running in ACTIVE state
 34.34112|ISTEP 21. 2 - host_verify_hdat
 34.36810|ISTEP 21. 3 - host_start_payload
[   35.181479495,5] OPAL skiboot-9858186 starting...
[   35.181483122,7] initial console log level: memory 7, driver 5
[   35.181485237,6] CPU: P9 generation processor (max 4 threads/core)
[   35.181487084,7] CPU: Boot CPU PIR is 0x0054 PVR is 0x004e1201
[   35.181490035,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[   35.181493147,7] Assigning physical memory map table for nimbus
[   35.181496208,7] Parsing HDAT...
[   35.181497493,7] SPIRA-S found.
[   35.181500165,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   35.181573498,6] SP Family is ibm,ast2500,openbmc
[   35.181579994,7] LPC: IOPATH chip id = 0
[   35.181581326,7] LPC: FW BAR       = f0000000
[   35.181582857,7] LPC: MEM BAR      = e0000000
[   35.181584303,7] LPC: IO BAR       = d0010000
[   35.181585742,7] LPC: Internal BAR = c0012000
[   35.181598542,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   35.181601241,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   35.182842118,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   35.182938290,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   35.182981372,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[   35.183024148,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[   35.183065251,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[   35.183106029,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[   35.183162356,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[   35.183211752,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   35.183934952,5] CHIP: Chip ID 0000 type: P9N DD2.10
[   35.184291461,5] PLAT: Detected Talos platform
[   35.184352236,5] PLAT: Detected BMC platform ast2500:openbmc
[   35.201105130,5] CPU: All 16 processors called in...
[   35.266716473,7] LPC: Routing irq 10, policy: 0 (r=1)
[   35.266717473,7] LPC: SerIRQ 10 using route 0 targetted at OPAL
[   36.374445970,5] HIOMAP: Negotiated hiomap protocol v2
[   36.374512392,5] HIOMAP: Block size is 64KiB
[   36.374559447,5] HIOMAP: BMC suggested flash timeout of 8s
[   36.374589305,5] HIOMAP: Flash size is 64MiB
[   36.374627095,5] HIOMAP: Erase granule size is 64KiB
[   38.486837922,5] FLASH: Found system flash: (unnamed) id:0
[   39.277537988,7] LPC: Routing irq 4, policy: 0 (r=1)
[   39.277539066,7] LPC: SerIRQ 4 using route 1 targetted at OPAL
[   39.277694733,5] OCC: All Chip Rdy after 0 ms
[   40.074902821,3] STB: VERSION verification FAILED. log=0xffffffffffff8160
[   41.192679074,3] STB: IMA_CATALOG verification FAILED. log=0xffffffffffff8160
[   41.384604683,3] CAPP: Error loading ucode lid. index=201d1
[   41.398817729,5] PCI: Resetting PHBs and training links...
[   42.419398089,5] PCI: Probing slots...
[   42.476358942,3] PCI: PHB  (80000) not found
[   42.476847088,3] PCI: PHB  (80001) not found
[   42.477738924,3] PCI: PHB  (80003) not found
[   42.478659084,5] PCI Summary:
[   42.479475473,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot2 (16x) 
[   42.481349662,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot1 (8x) 
[   42.483357056,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=Builtin SAS 
[   42.484697264,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB 
[   42.486595728,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
[   42.488494066,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet 
[   42.490306090,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:00 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   42.492133493,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:00 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   42.494059776,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC 
[   42.495807605,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
[   42.498043037,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 (           vga) LOC_CODE=BMC
[   42.504305115,5] IPMI: Resetting boot count on successful boot
[   42.506128825,5] INIT: Waiting for kernel...
[   54.323916712,3] STB: BOOTKERNEL verification FAILED. log=0xffffffffffff8160
[   54.326205744,5] INIT: 64-bit LE kernel discovered
[   54.483191117,5] INIT: Starting kernel at 0x20011000, fdt at 0x306f1ba8 204105 bytes

zImage starting: loaded at 0x0000000020011000 (sp: 0x000000002145bed8)
Allocating 0x23f4098 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x0000000020020000:0x0000000021459984)...
Done! Decompressed 0x220c5d8 bytes
tlaurion commented 2 years ago

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

@macpijan :


user@captive-portal:~$ scp /home/user/QubesIncoming/Insurgo/coreboot-binary/*.ecc root@talos:/tmp/
root@talos's password: 
bootblock.signed.ecc                                                                                                100%   28KB   1.3MB/s   00:00    
coreboot.rom.signed.ecc                                                                                             100% 2309KB   3.2MB/s   00:00    
user@captive-portal:~$ ssh -l root talos
root@talos's password: 
root@talos:~# obmcutil poweroff
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:~# systemctl stop mboxd
root@talos:~# systemctl start mboxd
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:~# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s     
root@talos:~# pflash -P HBI -p /tmp/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# pflash -P HBB -p /tmp/bootblock.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:~# obmcutil poweron

root@talos:~# cat /var/log/obmc-console.log

SIGTERM received, booting...
Requesting system poweroff
[  799.110662] reboot: Power down
[  855.188539622,5] OPAL: Shutdown request type 0x0...
st type

coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12297 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms

coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
starting istep 13.2
ending istep 13.2
starting istep 13.3
putRing took 3 ms
ending istep 13.3
starting istep 13.4
ending istep 13.4
starting istep 13.6
ending istep 13.6
starting istep 13.8
Please FIXME: ATTR_MSS_RUNTIME_MEM_THROTTLED_N_COMMANDS_PER_SLOT
ending istep 13.8
starting istep 13.9
ending istep 13.9
starting istep 13.10
CCS took 2 us (3 us timeout), 1 instruction(s)
CCS took 2 us (2 us timeout), 14 instruction(s)
RCD dump for I2C address 0x5a:
0xf820fa30: 80 b3 40 42 30 00 00 00 02 01 00 03 cb e3 c0 0d  ..@B0...........
0xf820fa40: 00 00 39 00 00 00 00 00 00 00 07 00 00 00 00 00  ..9.............
ending istep 13.10
starting istep 13.11
CCS took 2 us (7 us timeout), 2 instruction(s)
Write Leveling starting
CCS took 13 us (92 us timeout), 5 instruction(s)
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
Initial Pattern Write done
DQS alignment starting
CCS took 11 us (44 us timeout), 1 instruction(s)
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
Read Centering done
Write Centering starting
CCS took 8848 us (11314 us timeout), 7 instruction(s)
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12380 size 0xcbe5 in mcache @0xf82
SergiiDmytruk commented 2 years ago

We also have 8 GiB stick at SPD @ 0x52 on first CPU, but limiting memory to that doesn't trigger the issue.

There is also passing memory data from romstage to ramstage, but it looks quite reliable.

@krystian-hebel any ideas what can go wrong on moving to ramstage and how to debug it? Setting log level to "spew" doesn't seem to add anything to the output.

0xF000F = 221d104900008040

DD21. Hope we didn't mess up with attribute values for different DDs...

@tlaurion Does it work with two CPUs or you haven't tried it?

tlaurion commented 2 years ago

@SergiiDmytruk didn't try 2 cpus, no. For the moment, I am testing one CPU one ram module, basically Talos lite setup.

Shipped board was rejected for sale since some (unknown) ram slots were non-functional. I could buy more RAM to test second CPU later on if there is a way to test one CPU setup even if 2 CPUs are inserted.

I would not see myself adding and removing those often and would prefer testing one CPU setup here until you guys can software disable the second CPU and move from there.

krystian-hebel commented 2 years ago

Seems it hangs during loading of ramstage, because final romstage line (BS: romstage times ...) isn't printed. Maybe something wrong with memory address translation set in 14.5?

First step of debugging would be checking in GUI if it checkstoped or "just" watchdog timed out. Then, for trying to pinpoint what exactly checkstops one has to read appropriate SCOMs in search of failure. Maybe it is worth trying to prepare a script for dumping all of those, although this kind of depends on installed hardware (e.g. which cores on CPU are enabled) and I don't know if reading all registers is safe.

krystian-hebel commented 2 years ago

@tlaurion can you try installing DIMM in other slots? This looks like memory issue, but it's hard to tell whether it is hardware or configuration issue at this point. On the other hand memory training seems to be successful...

tlaurion commented 2 years ago

@krystian-hebel Hostboot boots, as per https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1111090849 provided logs.

The test report that came with the board stated that hostboot test run failed on some other ram slots, but the one stated per user manual for cpu0 one ram module is working.

tlaurion commented 2 years ago

@krystian-hebel I have not changed jumpers either, if that is a needed thing (secure boot is on by default).

I thought it was not, since booting from mounted ROM. But if anything else needs to be done, please advise.

krystian-hebel commented 2 years ago

9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000 suggests it is disabled.

@tlaurion could you log in the GUI after it hangs and check if https://<BMC IP>/#/server-health/event-log shows anything interesting? Especially if there is a fresh "checkstop" reported.

tlaurion commented 2 years ago

@tlaurion could you log in the GUI after it hangs and check if https://<BMC IP>/#/server-health/event-log shows anything interesting? Especially if there is a fresh "checkstop" reported.

@krystian-hebel:

Four errors are reported from hostboot boot, yes. Export in json from that page: {"/xyz/openbmc_project/logging/entry/72":{"AdditionalData":["_PID=1224","ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 2c d5 64 ff 00 00 00 03 2c fc 0c bc 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 56 90 00 02 56 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4c 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 25 d7 c9 c6 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d7 d8 47 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 25 d7 e6 c2 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 25 d7 f1 a5 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 25 d7 fc 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 25 d8 0a d1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 25 d8 15 6a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 25 d8 20 04 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d8 2e ac 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 26 21 19 37 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 26 21 23 51 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ac ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ae ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 36 18 03 00 00 00 00 00 00 ff","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":72,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082386994,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/73":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","PROCEDURE=85","_PID=1224"],"Id":73,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082387182,"Version":"2.7.0-dev-571-g67efd9872","associations":[]},"/xyz/openbmc_project/logging/entry/74":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","_PID=1224","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":74,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082391898,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/75":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 32 72 b6 03 00 00 00 03 32 99 d4 34 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 90 90 00 02 90 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4e 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 30 cf 0e 0a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 1c 67 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 30 cf 2a 90 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 30 cf 34 fe 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 30 cf 3f b1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 30 cf 4e 4d 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 30 cf 59 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 30 cf 64 1f 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 72 e0 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 31 19 48 32 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 31 19 52 bf 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 80 00 03 63 07 01 10 3f 80 00 03 64 07 01 10 3f 80 00 03 65 07 01 10 3f 80 00 03 66 07 01 10 3f 80 00 03 67 07 01 10 3f 80 00 03 68 07 01 10 3f 80 00 03 69 07 01 10 3f 80 00 03 6a 07 01 10 3f 80 00 03 6b 07 01 10 3f 80 00 07 60 07 01 10 3f 80 00 07 61 07 01 10 3f 80 00 07 62 07 01 10 3f 80 00 07 63 07 01 10 3f 80 00 07 64 07 01 10 3f 80 00 07 65 07 01 10 3f 80 00 07 66 07 01 10 3f 80 00 07 67 07 01 10 3f 80 00 07 68 07 01 10 3f 80 00 07 69 07 01 10 3f 80 00 07 6a 07 01 10 3f 80 00 07 6b 07 01 10 3f 80 00 0b 60 07 01 10 3f 80 00 0b 61 07 01 10 3f 80 00 0b 62 07 01 10 3f 80 00 0b 63 07 01 10 3f 80 00 0b 64 07 01 10 3f 80 00 0b 65 07 01 10 3f 80 00 0b 66 07 01 10 3f 80 00 0b 67 07 01 10 3f 80 00 0b 68 07 01 10 3f 80 00 0b 69 07 01 10 3f","PROCEDURE=85","_PID=1224"],"Id":75,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082392034,"Version":"2.7.0-dev-571-g67efd9872","associations":[]}}

Hostboot boot reported (snippets) from above logs:

20.48206|================================================
 20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
 20.49786|  No WOF table match found
 20.49787|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 20.49787|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 20.49788|  UserData1  Number of cores : 0x0004000200000064
 20.49789|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 20.49790|------------------------------------------------
 20.49790|  Callout type             : Procedure Callout
 20.49791|  Procedure                : EPUB_PRC_HB_CODE
 20.49792|  Priority                 : SRCI_PRIORITY_HIGH
 20.49793|------------------------------------------------
 20.49793|  Callout type             : Hardware Callout
 20.49795|  Target                   : Physical:/Sys0/Node0/Proc0
 20.49795|  Deconfig State           : NO_DECONFIG
 20.49796|  GARD Error Type          : GARD_NULL
 20.49797|  Priority                 : SRCI_PRIORITY_MED

and

26.63992|================================================
 26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
 26.63994|  No WOF table match found
 26.63995|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 26.63995|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 26.63996|  UserData1  Number of cores : 0x0004000200000064
 26.63997|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 26.63998|------------------------------------------------
 26.63999|  Callout type             : Procedure Callout
 26.63999|  Procedure                : EPUB_PRC_HB_CODE
 26.64000|  Priority                 : SRCI_PRIORITY_HIGH
 26.64001|------------------------------------------------
 26.64002|  Callout type             : Hardware Callout
 26.64003|  Target                   : Physical:/Sys0/Node0/Proc0
 26.64004|  Deconfig State           : NO_DECONFIG
 26.64004|  GARD Error Type          : GARD_NULL
 26.64005|  Priority                 : SRCI_PRIORITY_MED
 26.64006|------------------------------------------------
 26.64006|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
tlaurion commented 2 years ago

9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000 suggests it is disabled.

Right. Secure Mode came disabled for both CPUs, jumper being set at 2nd and 3rd PINs for both CPUs.

krystian-hebel commented 2 years ago

@tlaurion those events (unfortunately?) aren't checkstops, which may make debugging harder. Most likely those are just logs from those errors reported by Hostboot. Feel free to clear that log so it won't keep showing them. Still, I'd like to ask you to dump some SCOM registers that would help if it were a checkstop, because some errors may still be masked at that time:

  1. Copy fir_scoms.txt to /tmp/fir_scoms.txt on BMC
  2. Start the platform and wait for it to get to the point where it hangs
  3. Run on BMC (this takes almost 2 minutes on our platform, may take more time if yours is stuck in a loop):
    while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
    done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
  4. Send /tmp/scom_dump.log here.

It is important to get the full dump from the first run - in case of problem in one place it may report further errors when trying to read additional registers, and that can obscure the real issue. Ideally those registers should be accessed in proper order, depending on results of earlier reads, but asking for them one at a time would take too much.

tlaurion commented 2 years ago

@krystian-hebel

The problem here, it testing :

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

Is that the machine bootloops, so that when I run the script:

root@talos:~# while read scom; do
>         if [[ "$scom" == "0x"* ]]; then
>                 pdbg -P pib0 getscom $scom
>         else
>                 echo "$scom"
>         fi
> done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown

Here is still the output of it, prior of it giving pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown: scom_dump.log

krystian-hebel commented 2 years ago

IIRC it would stop bootlooping after 3 or 4 tries (there is a way to reduce that number, but Hostboot doesn't like it), but I think all important registers were dumped. I'll take a look an will let you know when I find anything.

tlaurion commented 2 years ago

For some reason, next boot didn't hit a bootloop.

Running:

while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
done < /tmp/fir_scoms.txt > /tmp/scom_dump.log

obmc-console.log scom_dump.log

Here is scp'ed logs

miczyg1 commented 2 years ago

Maybe we should consider some scripts or guides which information to dump and how t do it to help us debugging issues.

tlaurion commented 2 years ago

@macpijan would it be useful to have scom dump from successful hostboot booting petitboot?

krystian-hebel commented 2 years ago

Those shouldn't matter, I kind of know what to expect there. What would help would be fuller dump when it hangs and a checkstop happens (i.e. when second register on the list, 0x500f001c, is non-zero). If you could remove from input file all cache chiplets except chiplet 4 (0x14xxxxxx left, other 0x1xxxxxxx removed), and all core chiplets except chiplet 18 (0x32xxxxxx left, other 0x2xxxxxxx and 0x3xxxxxxx removed) and try again, it should take less time and hopefully dump all/most of the registers before reboot happens.

I don't know how https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1112220743 didn't get to the checkstop, instead it just stopped in a random place.

tlaurion commented 2 years ago

@krystian-hebel: not sure what a checkstop is. (0x500f001c is zero here)

The logs collected here were collected from machine not stuck in a bootloop. The machine is still on now, not having rebooted automatically.

So I understand that to test and report the proper logs, it is required to boot successfully from hostboot first (to reset state and produce bootloops, for which a checkstop is what is triggered when a bootloop occurs and where system reboots automatically?)

fir_scoms_issue80.txt obmc-console.log scom_dump_issue80.log

tlaurion commented 2 years ago

@krystian-hebel : when in bootloop

root@talos:~# while read scom; do         if [[ "$scom" == "0x"* ]]; then                 pdbg -P pib0 getscom $scom;         else                 echo "$scom";         fi; done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump_issue80.log
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device
pdbg: Failed to write to 0x0000101c (0000000000001007): No such device

But 0x500f001c is non-zero.

Attached is the log I was able to get in such case. scom_dump_issue80_bootloop.log

krystian-hebel commented 2 years ago

Ok, I was expecting it to be stuck in checkstop state after those 5 boot attempts, but apparently it isn't. Let's try different approach: when it is about to reboot, execute pdbg -a stop (this stops CPU execution, giving more time for dumping), then check if global FIR is non-zero with pdbg -P pib0 getscom 0x500f001c (this one is safe to read, shouldn't impact other registers), and only if it is non-zero, get the full dump.

If that register is still zero, try pdbg -a start followed by pdbg -a stop to give it some more time to work.

tlaurion commented 2 years ago
user@talos-tests:~$ ssh -l root talos
root@talos's password: 
root@talos:~# pdbg -a stop
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Timeout waiting for special wakeup on POWER9 Core@0x00000000
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
Unable to quiesce thread
(reverse-i-search)`while': while read scom; do         if [[ "$scom" == "0x"* ]]; then                 pdbg -P pib0 getscom $scom;         else                 echo "$scom";         fi; done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump_issue80.log

And it reboots prior of me being able to get anything more.

tlaurion commented 2 years ago

This one completed successfully, but machine is still up.

scom_dump_issue80_noscreen_no_bootloop.log

tlaurion commented 2 years ago

@krystian-hebel this is the registers polled (provide any reduced version needed) https://github.com/Dasharo/dasharo-issues/files/8603193/fir_scoms_issue80.txt

Even if calling pdbg -a stop, bootloop still occurs as shown in precedent output https://github.com/Dasharo/dasharo-issues/issues/80#issuecomment-1114958057

krystian-hebel commented 2 years ago

Yeah, the window for stopping CPU is easy to miss, and watchdog probably still works. This should be enough for me to dig into, I'll let you know what those numbers tell me.

tlaurion commented 2 years ago

So to clarify. "Checkstop" is happening at the end of the bootloop with machine staying up without further reboot?

If it is the case, debugging instructions should state to let machine bootloop 5 times prior of collecting the register logs.

Otherwise, please let me know for further test runs.

krystian-hebel commented 2 years ago

Checkstop is whenever 0x500f001c is non-zero. Whether platform reboots or stops depends on the configuration, I thought that it would do 4 reboots and get stuck on 5th. It should be reported in those logs in GUI that I asked earlier, but from those logs it seems that something else happens instead.

krystian-hebel commented 2 years ago

Findings so far:

I looked once again at console log and noticed something strange:

MCS0 MCA1 DIMM0 has 0 bad nibble(s) and 0 bad bit(s), but can be recovered

This means that RAM is bad enough to error out during calibration, but for some reason (probably bug in our code) it can't read which lines failed. Summary registers says this is more than 1, up to 4 bad bits which should be recoverable in most cases (although it uses ECC to recover data, so further errors may be unrecoverable), Hostboot probably does it.

@tlaurion please try those and sent just the console dump: https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT

tlaurion commented 2 years ago

@krystian-hebel

xf8231080
he @

coreboot-4.14-526-gad4b5b2f0e Mon May  2 16:20:29 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x126bb in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms

coreboot-4.14-526-gad4b5b2f0e Mon May  2 16:20:29 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
starting istep 13.2
ending istep 13.2
starting istep 13.3
putRing took 3 ms
ending istep 13.3
starting istep 13.4
ending istep 13.4
starting istep 13.6
ending istep 13.6
starting istep 13.8
Please FIXME: ATTR_MSS_RUNTIME_MEM_THROTTLED_N_COMMANDS_PER_SLOT
ending istep 13.8
starting istep 13.9
ending istep 13.9
starting istep 13.10
CCS took 2 us (3 us timeout), 1 instruction(s)
CCS took 2 us (2 us timeout), 14 instruction(s)
RCD dump for I2C address 0x5a:
0xf820fa30: 80 b3 40 42 30 00 00 00 02 01 00 03 cb e3 c0 0d  ..@B0...........
0xf820fa40: 00 00 39 00 00 00 00 00 00 00 07 00 00 00 00 00  ..9.............
ending istep 13.10
starting istep 13.11
CCS took 2 us (7 us timeout), 2 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000000000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling starting
CCS took 13 us (92 us timeout), 5 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Initial Pattern Write done
DQS alignment starting
CCS took 10 us (44 us timeout), 1 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000000000 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Centering done
Write Centering starting
CCS took 8852 us (11314 us timeout), 7 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a3a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a1a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003e1a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a3a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a00 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
DP 0
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a3a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 1
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a1a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 2
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003e1a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 3
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a3a - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
DP 4
        0x0000000000000000 - RD_VREF_CAL_ERROR
        0x0000000000000000 - DQ_BIT_DISABLE_RP0
        0x0000000000000000 - DQS_BIT_DISABLE_RP0
        0x0000000000000000 - WR_ERROR0
        0x0000000000000000 - RD_STATUS0
        0x0000000000000000 - RD_LVL_STATUS2
        0x0000000000000000 - RD_LVL_STATUS0
        0x0000000000003a00 - WR_VREF_ERROR0
        0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x127c0 size 0xcbe5 in mcach
tlaurion commented 2 years ago

Checkstop is whenever 0x500f001c is non-zero. Whether platform reboots or stops depends on the configuration, I thought that it would do 4 reboots and get stuck on 5th. It should be reported in those logs in GUI that I asked earlier, but from those logs it seems that something else happens instead.

Thanks for the clarification.

There are org.open_power.Host.Boot.Error.Checkstop when booting coreboot, but they only contain PIDs and seemed irrelevant. Content of such logs from last boot attempt were: _PID=6707 and _PID=6404

tlaurion commented 2 years ago

@krystian-hebel anything else you would need ?

krystian-hebel commented 2 years ago

@tlaurion I updated binaries at https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT, please check with those. I expect it to die on https://github.com/Dasharo/coreboot/blob/raptor-cs_talos-2/ram_debug_fixes/src/soc/ibm/power9/istep_13_11.c#L896. If that is the case, I would like to see both console dump and SCOM dump, if it doesn't hang there just the console output (may be just from start of istep 13.11 onward) will be enough.

As the comment above line from link says, there is a lot of code in the workaround missing. I haven't implemented it because all of our DIMMs were nice enough to not require it, so I had no way of testing whether it works. Seems that this may have just changed :slightly_smiling_face:

tlaurion commented 2 years ago

@krystian-hebel

root@talos:~# cd /tmp/v0.5.0_debug_ram/
root@talos:/tmp/v0.5.0_debug_ram# sha256sum *
3f82006b01cbbdb5ca4b1bb7c56088cad7e5fa7cdbcc9275e1e36905334c882e  bootblock.signed.ecc
7c8f7da210b5facb89724347f4096d7bff8024e2d9e046a132f52f0286f3f86c  coreboot.rom.signed.ecc
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend vpnor
Failed to post message: Connection timed out
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:/tmp/v0.5.0_debug_ram# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s     
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBB -p /tmp/v0.5.0_debug_ram/bootblock.signed.ecc -F /tmp/talos.pnor                             
About to program "/tmp/v0.5.0_debug_ram/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBI -p /tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory

Will be reusing: Reusing https://github.com/Dasharo/dasharo-issues/files/8603193/fir_scoms_issue80.txt and

while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log

Ouput:

starting root@talos:/tmp/v0.5.0_debug_ram# cat /var/log/obmc-console.log 
0000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Initial Pattern Write done
DQS alignment starting
CCS took 10 us (44 us timeout), 1 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000000000 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Centering done
Write Centering starting
CCS took 8449 us (11314 us timeout), 7 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003a0e - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003e1a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000001a1a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000001a3a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003a00 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
DP 0
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003a0e - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 1
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003e1a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 2
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000001a1a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 3
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000001a3a - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
DP 4
    0x0000000000000000 - RD_VREF_CAL_ERROR
    0x0000000000000000 - DQ_BIT_DISABLE_RP0
    0x0000000000000000 - DQS_BIT_DISABLE_RP0
    0x0000000000000000 - WR_ERROR0
    0x0000000000000000 - RD_STATUS0
    0x0000000000000000 - RD_LVL_STATUS2
    0x0000000000000000 - RD_LVL_STATUS0
    0x0000000000003a00 - WR_VREF_ERROR0
    0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12840 size 0xcbe7 in mcache @root@talos:/tmp/v0.5.0_debug_ram# 

No checkstop?

root@talos:/tmp/v0.5.0_debug_ram# pdbg -P pib0 getscom 0x500f001c
p0: 0x00000000500f001c = 0x0000000000000000 (/kernelfsi@0/pib@1000)
root@talos:/tmp/v0.5.0_debug_ram# 
root@talos:/tmp/v0.5.0_debug_ram# while read scom; do
>         if [[ "$scom" == "0x"* ]]; then
>                 pdbg -P pib0 getscom $scom
>         else
>                 echo "$scom"
>         fi
> done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log
root@talos:/tmp/v0.5.0_debug_ram#

Attached are the scom logs: scom_dump.log

krystian-hebel commented 2 years ago

@tlaurion to exclude possibility of bad support for CPU, could you test if it works (or at least behaves differently) when this DIMM is installed in other slots, even though instruction tells to use this one?

tlaurion commented 2 years ago

@macpijan : user's manual suggests memory bank population for one DIMM to be in B1, on one/two CPU configuration.

You suggest that I put my single M393A1K43BB0-CRC in what other slot?

EDIT: outside channel answer of @krystian-hebel "Any one should do." Putting in D1.

tlaurion commented 2 years ago

hmmm. Memory stick in D1:

16.90082|================================================
 16.91730|Error reported by prdf (0xE500) EID 0x90000366
 16.91731|  PRD Signature            : 0x240005 0xFFFF0044
 16.91874|  Signature Description    : pu.mca:k0:n0:s0:p00:c5 () RDR: Repairs needed but unavailable
 16.91875|  UserData1   : 0x0024000500000000
 16.91876|  UserData2   : 0xffff004400000000
 16.91876|------------------------------------------------
 16.91877|  Callout type             : Hardware Callout
 16.91878|  Target                   : Physical:/Sys0/Node0/DIMM6
 16.91879|  Deconfig State           : DELAYED_DECONFIG
 16.91880|  GARD Error Type          : GARD_Predictive
 16.91880|  Priority                 : SRCI_PRIORITY_HIGH
 16.91881|------------------------------------------------
 16.91882|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 16.91882|================================================
 17.26972|System Shutting DownTo Perform Reconfiguration After Deconfig
 17.26973|IPMI: Initiate power cycle
 17.27087|Stopping istep dispatcher
 17.99366|IPMI: shutdown complete
15.12352|ISTEP 13. 7 - host_enable_memvolt
 15.13551|ISTEP 13. 8 - mss_scominit
 15.47751|ISTEP 13. 9 - mss_ddr_phy_reset
 15.53023|ISTEP 13.10 - mss_draminit
 15.61190|ISTEP 13.11 - mss_draminit_training
 16.03449|System Shutting DownTo Perform Reconfiguration After Recoverable Error
 16.03451|IPMI: Initiate power cycle
 16.05795|Stopping istep dispatcher
 16.41643|IPMI: shutdown complete
tlaurion commented 2 years ago

Memory stick now in A1. Debugged with @krystian-hebel.

It seems that the issue above was linked to components being faulty and the Talos deactivating those components to continue on other working ones as described here.

To make A1 work, it was necessary to wipe from instructions: pflash -P GUARD -c

Note: On my board, D1 slot is faulty (let's remember the donation came from a board that failed DUT_HW_GUARD_FAILURE), with the device Test report specifying that "2x ram slots fail".


What we know as of now if that A1 and B1 (blue slots, under CPU0) works, and hostboot is able to train (as coreboot) the M393A1K43BB0-CRC stick.

It seems that the CPU revision in my setup is not known from DEFCONFIG (from Hostboot):

34.38209|ISTEP 21. 1 - host_runtime_setup
 36.26054|================================================
 36.26055|Error reported by fapi2 (0x3300) EID 0x900003C7
 36.26056|  No WOF table match found
 36.26056|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 36.26057|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 36.26058|  UserData1  Number of cores : 0x0004000200000064
 36.26059|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 36.26060|------------------------------------------------
 36.26061|  Callout type             : Procedure Callout
 36.26062|  Procedure                : EPUB_PRC_HB_CODE
 36.26062|  Priority                 : SRCI_PRIORITY_HIGH
 36.26063|------------------------------------------------
 36.26064|  Callout type             : Hardware Callout
 36.26066|  Target                   : Physical:/Sys0/Node0/Proc0
 36.26066|  Deconfig State           : NO_DECONFIG
 36.26067|  GARD Error Type          : GARD_NULL
 36.26068|  Priority                 : SRCI_PRIORITY_MED
 36.26069|------------------------------------------------
tlaurion commented 2 years ago

@krystian-hebel @macpijan some updates?

krystian-hebel commented 2 years ago

Interesting find. Soon after entering ramstage processor hits an unknown instruction and as a result exception happens. Normally, at that point there should be no interrupt handlers installed, but for some reason around that particular exception (@ 0x840) I see something that looks like proper PPC64 code:

root@talos:~# pdbg -p 0 -c 18 -t 0 getmem 0x0 0x1000 | hexdump -C
[==================================================] 100%
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000e20  38 61 01 70 4b ff fb 8d  2c 23 00 00 41 82 ff 6c  |8a.pK...,#..A..l|
00000e30  e8 bf 00 18 e9 3f 00 10  38 80 00 00 7c a3 28 50  |.....?..8...|.(P|
00000e40  7c 69 1a 14 48 00 25 cd  60 00 00 00 e8 9f 00 18  ||i..H.%.`.......|
00000e50  e8 7f 00 10 38 a0 00 01  48 00 26 c5 60 00 00 00  |....8...H.&.`...|
00000e60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

From there it jumps into unused memory (all 0s), hits another exception and goes back into this handler.

There are some open questions that I'll try to debug more:


On memory training front: there are no errors when only 1D write centering is performed, instead of 2D centering. This is one of the workarounds Hostboot does when full algorithm fails, but according to my notes all registers that Hostboot tests for failure are reporting a success, even for 2D training. I will have to dig into Hostboot's code again and compare it with my notes.

Nevertheless, even with forced 1D write centering, the problem with "random" data in interrupt handlers' area still happens. Other parts of RAM are properly zeroed so it is most likely working as it should. It may be a different issue altogether and this is what I'm assuming right now. At least at that point we have RAM and can install our own interrupt handlers to help with debugging, assuming they won't get overwritten.

@SergiiDmytruk constructed a list of differences between different DD revisions of CPU, but it seems that Hostboot doesn't always use everything it defines. There may be more code depending on DD versions that doesn't use attributes listed there. Some things are set before FAPI (used to access attributes, among other things) is enabled, like HRMOR or XIVE.

krystian-hebel commented 2 years ago

More findings:

pietrushnic commented 2 years ago

I just hope we don't chase broken hardware here. Please note we may have some more remote setups of Talos II for coparison of behavior soon. I'm working on that.

krystian-hebel commented 2 years ago

@pietrushnic Hostboot works with this hardware, so even if it is not perfect, it should at least hit one of TODO workaround paths in coreboot. For some reason this does not happen, no error is detected until it breaks fatally.

pietrushnic commented 2 years ago

@krystian-hebel understood, that means we need improvements on our side and we don't what the heck is going. I hope you can invent some diagnostics steps. Also maybe we should redirect this to mailing list, maybe someone there would have ideas how to debug that further?

krystian-hebel commented 2 years ago

@tlaurion I have another request: I need logs from talos.pnor from https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT. This is Hostboot with customized (lots of irrelevant stuff suppressed, 80% faster and produces smaller log) debug output. On our machine this takes ~45 minutes to boot and produces ~30MB of log, this may be slightly less with one CPU, one DIMM. It should fit in ramfs, even with PNOR image also stored there. Read it with obmc-console-client > /tmp/boot.log because /var/log/obmc-console.log has limited size.

tlaurion commented 2 years ago

Full logs provided off channel.

tlaurion commented 2 years ago

@krystian-hebel any updates?