Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
24 stars 0 forks source link

Talos II - XCOM/I2C errors reported on final OS (Also contains proper testing for testing reports) #475

Open tlaurion opened 1 year ago

tlaurion commented 1 year ago

Dasharo version

0.7.0

Dasharo variant

Common (talos-2 target under Heads)

Affected component(s) or functionality

skiboot?

Brief summary

Reproducibility notes:

user@talos-tests:~/QubesIncoming/heads-tests-deb12$ wget https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_bootblock_v0.7.0.signed.ecc https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_coreboot_v0.7.0.rom.signed.ecc https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_zImage_v0.7.0.bundled
--2023-08-14 13:39:30--  https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_bootblock_v0.7.0.signed.ecc
Resolving dl.3mdeb.com (dl.3mdeb.com)... 178.32.205.96
Connecting to dl.3mdeb.com (dl.3mdeb.com)|178.32.205.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 57348 (56K)
Saving to: ‘raptor-cs_talos-2_bootblock_v0.7.0.signed.ecc’

raptor-cs_talos-2_bootblock_v0.7.0.si 100%[=======================================================================>]  56.00K   217KB/s    in 0.3s    

2023-08-14 13:39:31 (217 KB/s) - ‘raptor-cs_talos-2_bootblock_v0.7.0.signed.ecc’ saved [57348/57348]

--2023-08-14 13:39:31--  https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_coreboot_v0.7.0.rom.signed.ecc
Reusing existing connection to dl.3mdeb.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 1184256 (1.1M)
Saving to: ‘raptor-cs_talos-2_coreboot_v0.7.0.rom.signed.ecc’

raptor-cs_talos-2_coreboot_v0.7.0.rom 100%[=======================================================================>]   1.13M   499KB/s    in 2.3s    

2023-08-14 13:39:33 (499 KB/s) - ‘raptor-cs_talos-2_coreboot_v0.7.0.rom.signed.ecc’ saved [1184256/1184256]

--2023-08-14 13:39:33--  https://dl.3mdeb.com/open-source-firmware/Dasharo/raptor-cs_talos-2/v0.7.0/raptor-cs_talos-2_zImage_v0.7.0.bundled
Reusing existing connection to dl.3mdeb.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 12058968 (12M)
Saving to: ‘raptor-cs_talos-2_zImage_v0.7.0.bundled’

raptor-cs_talos-2_zImage_v0.7.0.bundl 100%[=======================================================================>]  11.50M   375KB/s    in 28s     

2023-08-14 13:40:01 (428 KB/s) - ‘raptor-cs_talos-2_zImage_v0.7.0.bundled’ saved [12058968/12058968]

FINISHED --2023-08-14 13:40:01--
Total wall clock time: 31s
Downloaded: 3 files, 13M in 30s (431 KB/s)
user@talos-tests:~/QubesIncoming/heads-tests-deb12$ rsync -ravczz --inplace --delete /home/user/QubesIncoming/heads-tests-deb12 root@talos:/tmp/images/
sending incremental file list
heads-tests-deb12/
heads-tests-deb12/test

sent 356 bytes  received 39 bytes  87.78 bytes/sec
total size is 13,300,572  speedup is 33,672.33
user@talos-tests:~/QubesIncoming/heads-tests-deb12$ ssh -o "ServerAliveInterval=120" -t -l root talos 'cd /tmp/images/heads-tests-deb12/ && pflash -ra/tmp/talos.pnor && pflash  -F ../../talos.pnor -f -P HBB -p *bootblock* && pflash  -F ../../talos.pnor -f -P HBI -p *.rom* && pflash  -F ../../talos.pnor -f -P BOOTKERNEL -p *zImage*.bundled* && mboxctl --backend file:/tmp/talos.pnor && echo "ALL GOOD! Booting and attaching BMC console" && obmcutil poweron && obmc-console-client'
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s     
About to program "raptor-cs_talos-2_bootblock_v0.7.0.signed.ecc" at 0x00205000..0x00213004 !
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
About to program "raptor-cs_talos-2_coreboot_v0.7.0.rom.signed.ecc" at 0x00425000..0x00546200 !
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
About to program "raptor-cs_talos-2_zImage_v0.7.0.bundled" at 0x022a1000..0x02e21158 !
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
SetBackend: Success
ALL GOOD! Booting and attaching BMC console

How reproducible

Always

How to reproduce Download and flash according to above notes. FullConsoleLogFromHeadsToDebianLogin.txt Heads-provisioning-after-flash.txt pre-heads-console-logs.txt

Steps to reproduce the behavior:

  1. Have OpenPGP compatible usb dongle connected ready to be factory reset inserted in available USB port.
  2. Replicate testing tests as provided under Heads-provisioning-after-flash.txt
  3. Observe as reported under FullConsoleLogFromHeadsToDebianLogin.txt
    [  251.426845829,3] XSCOM: write-busy error gcid=0x8 pcb_addr=0xa3006 stat=0x1
    [  251.426849378,6] IPMI: dropping non severe PEL event
    [  251.426850152,3] XSCOM: Write failed, ret =  -2
    [  251.427196934,3] I2C: Failed to write the MODE_REG
    [  251.427228962,3] I2C: Failed to program the MODE_REG

And

[[  251.427467699,5] XSCOM: Busy even after 1000 retries, resetting XSCOM now. Total retries  = 1000
[  251.427679368,5] XSCOM: Busy even after 1000 retries, resetting XSCOM now. Total retries  = 2000
[  251.427890774,5] XSCOM: Busy even after 1000 retries, resetting XSCOM now. Total retries  = 3000

Expected behavior

Absence of worrisome errors on daily driver. Actual behavior

Errors under booted OS

Screenshots

Additional context

Solutions you've tried

tlaurion commented 1 year ago

Also just to clarify, there are two sources of "logs" with different timestamps, coming either from the kernel boot time 0 or skiboot boot time 0, which here tells us source of logs are mixed and confusing:

[  251.491634682,5] XSCOM: Busy even after 1000 retries, resetting XSCOM now. Total retries  = 3000
[  251.491641400,3] XSCOM: write-busy error gcid=0x8 pcb_addr=0xa3006 stat=0x1
[  251.491644569,6] IPMI: dropping non severe PEL event
[  251.491645319,3] XSCOM: Write failed, ret =  -2
[  251.492036026,3] I2C: Failed to write the MODE_REG
[  251.492073228,3] I2C: Failed to program the MODE_REG
[  OK  ] Finished Remove Stale Onli…ext4 Metadata Check Snapshots.
[  OK  ] Started Virtual Machine an…ontainer Registration Service.
[  OK  ] Started User Login Management.
[  OK  ] Started Avahi mDNS/DNS-SD Stack.
[  OK  ] Started Authorization Manager.
         Starting Modem Manager...
[  OK  ] Started Network Manager.
         Starting Network Manager Wait Online...
[  OK  ] Started WPA supplicant.
[  OK  ] Reached target Network.
         Starting CUPS Scheduler...
         Starting Virtualization daemon...
         Starting OpenBSD Secure Shell server...
         Starting Permit User Sessions...
[  OK  ] Finished Permit User Sessions.
         Starting Light Display Manager...
         Starting Hold until boot process finishes up...
[   10.880587] tg3 0004:01:00.0 enP4p1s0f0: Link is up at 100 Mbps, full duplex
[   10.880620] tg3 0004:01:00.0 enP4p1s0f0: Flow control is off for TX and off for RX
[   10.880643] tg3 0004:01:00.0 enP4p1s0f0: EEE is disabled
[   10.896478] IPv6: ADDRCONF(NETDEV_CHANGE): enP4p1s0f0: link becomes ready

the 251 timestamps AFAIK comes from skiboot, where the next timestamp is 0-10 timestamps matches the timeline and come from booted kernel. Ideally, the source of the logs would be clearer (skiboot) or non-existing at all because no error from skiboot.

miczyg1 commented 1 year ago

Maybe we could put somethign like [ 251.492073228,3] (skiboot) I2C: Failed to program the MODE_REG to eahc lien printed by skiboot

krystian-hebel commented 1 year ago

Ideally there should be no lines printed by skiboot at all at this point. They can be easily differentiated by looking at log level, e.g. ,3 after timestamp.

tlaurion commented 1 year ago

Ideally there should be no lines printed by skiboot at all at this point. They can be easily differentiated by looking at log level, e.g. ,3 after timestamp.

@krystian-hebel

What are 5 and 6 then? I didn't realize up until now there was 3, 5 and 6 following timestamp

krystian-hebel commented 1 year ago

These are different log levels defined in skiboot.h. The lower the number, the more important that line is. Minimal saved level can be configured separately for screen and in-memory console through device tree that coreboot passes to skiboot.