Fohdeesha / lab-docu

Centralized documentation for equipment used by STH & FBOM members
http://fohdeesha.com/docs/
GNU General Public License v3.0
166 stars 59 forks source link

Sbr Tool , checksum & intel board #27

Open Adefx opened 2 years ago

Adefx commented 2 years ago

Hi buddy, i've been following your "issue" topic on the p710 flashing problem.

I was researching exactly the same thing because i try for days to crossflash a Intel integrated 2208 raid card ( formely a rms25cb080 ) Of course and thanks to intel the hardware is locked on intel Motherboard. But well , i got 2 of them .

I ve succeed in flashing this 2208 ( D1 ) card with an lsi firmware, but the result is it doesn't detect any disk in IT mode, and the MPT2sas detection freeze when using a intel 24port expander.

So i was thinking the problem come from the SBR , ( i flashed a blank 512byte one ) and i tried to edit the original dumped one. When i try to parse it with SBRTOOL , i got some checksum error ( MFG DATA copies differ, using first / mfgdata cheksum error / sas adress checksum error.

So i used the one you provided = checksum-sbrtool.py and i get a checksum value 148 / proper checksum 148 not sure why it's the same number. and after that mfg1 and mfg2 which are the same...

From here i'm lost... could you help me ? Attached my original sbr file. Intel Sbr.zip

Thanks a lot !

oddballracing commented 10 months ago

Adefx,

You're not alone. I've got a stack of R2312GZ/GL (various OEM models of course) I got from auction and am fighting this same battle (albeit well over a year later. I just ordered a few RMS25JB080 (2308-based mezzanine) controllers from eBay for pretty cheap to continue my investigation of the conversion of my existing RMS25CB080 mezzanine modules. I encountered issues with the expander card as well, causing a kernel panic on boot of BSD (TrueNAS) if connected when booting, and an instant panic if hot-plugged in while the OS was running. I have since reverted back to the original SBR and SPD and flashed the latest intel-hosted firmware package back onto the card. Right now i am looking at the checksum side of things in hopes to shed some light on my issues (one thing being that the BIOS on the S2600GZ/GL board states no adapter present once the crossflash is performed. Once reverted this correctly shows a RMS25CB080 (and PCIe3 support for the SAS module can be toggled again). I will add your provided SBR for comparison as well, though I suspect the same as mine.

I ran into issues on the first of these servers I got a couple of years ago, and never really spent anymore time on this (although I did take a look into trying to implement PCIe bifurcation on the board, unfortunately unsuccessfully, not easily implemented with the available AMI tools I came across)

The layout (second copy of the SBR data) on my RMS25CB080 does not match the layout of the LSI 9285 SBR file i have compared against. the Intel has the first instance of the data in 0x0000-0x00DF (checksum at 0x00DF) and the second set of data at 0x00E0-0x01BF, SBR file totaling 512 bytes (padded at the end with zeroes up to 0x01FF. On the 9285 file the first data is at 0x0000-0x007F (checksum at 0x007f, with the second set at 0x0080-0x00FF (256-byte boundary), with the remainder zero-padded to the 512-byte boundary.

When the 2308 controllers arrive (hopefully this week), I will gather some information on those as far as revision, SBR comparison, etc. and post a copy of the SBR as i have yet to see any real information regarding these intel units. INTEL_SBR-SPD.zip

oddballracing commented 10 months ago

While Fohdeesha has laid the groundwork for us, I think we need to extrapolate the work done with the Dell controllers in hopes we can answer these questions on what Intel has done with their servers.

The SBR files you posted are confirmed binary identical (with HxD) to the one I pulled off my 'sacrificial' RMS25CB080.

oddballracing commented 10 months ago

Progress:

The RMS25JB080 cards came in, and I pulled the SBR from them. comparing them with the RMS25CB080 SBR, there is very little different, but they have the same binary 'blob' I found on both my RMS25CB080 and RS25SB080 (8-port external raid PCIe) cards, so this might be what the System BIOS on the S2600GL wants in order to identify the card. In Aptio Setup Utility (BIOS) -> Advanced -> Mass Storage Controller Configuration, the card now reports as expected as "Intel(R) Integrated RAID Module RMS25JB080", so definitely something in the SBR that gets this validated (with an empty SBR or a modified SBR, this reported as "None" or "Empty" IIRC).

Now, I've loaded the "9207-8.bin firmware (2308 IT mode) firmware from the 9207-8i and the mpt2sas bios and since loading the BIOS I'm seeing the AVAGO (LSI) BIOS hanging at "Initializing" with the RES2SV240 expander connected, and I don't get far enough to CTRL-C into the controller's BIOS to see what is going on. Disconnecting the expander and connecting port 0-3 directly to the 12-bay backplane, I don't have a problem, and a SATA disk detects just fine and boot process continues. SBR pulled from the RMS25JB080 (2308) is attached.

EDIT: With the actual 2308-based RMS25JB080 installed, I did not have any hang-ups at initialization or kernel panics in FreeBSD with the expander connected, so this is definitely arising from something in the cross-flash of the RMS25CB080.

RMS25JB080.zip

I have an IBM expander I will swap in just to see if I get the same behavior with it, thinking this is some weird Intel OEM shenanigans that needs further investigation, but I suspect that I will get the same hang-up.

I'll update this thread with all of my findings and with what I come up with as a workable scenario tonight, even if that does not include an expander... yet.

Adefx commented 10 months ago

Wow , you've gone a lot further than me ;-) Even if i gave up replacing the card in both of my servers, i'm following your progress with interest.

oddballracing commented 10 months ago

Hey, I can totally relate. I'm finding my way by feel mostly with this, trying to get my head around the structures that LSI put together on these things. When I first attempted this (before Fohdeesha completed the PERC ISO), I ended up switching the RMS25CB080 out for a pair of 1068E-based cards and got on with life. Now after getting another pallet of these servers, I just figured it was worth another shot.

I've managed to brick my test card about 7 times so far this weekend (through some clearly improper flashing attempts with sections from the RMS225JB080, but thanks to my cheapo CH341 EEPROM programmer I have managed to recover it by erasing the SBR each time. What was happening is that the flash was modifying the SBR data and clearly the Intel BIOS was unhappy about it, throwing an NMI (non-maskable interrupt) which stopped the machine in it's tracks. erasing the SBR wrote all 0xFF and allowing this to default to non-initialized 2208 values (1000:0081) for device IDs, so I could get back into DOS/EFI/Linux and start messing around again.

My current status is somewhat where I was before, and now I'm trying some different connections to see what MAY work with the expander at all, and I've found something that is halfway functional, but far from ideal, at the moment:

SBR: RMS25JB080 (512B Unmodified) FW: 13.00.66.00 (P13.5 IT from 9207-8.bin) this is the same version present on at least one of my RMS25JB080s received. BIOS: 07.25.00.00 (mptsas2.rom from P13.5 package for 9207/9217) loaded just for testing, wouldn't need this in production No EFI/BSD currently loaded, but the RMS25JB080 came with 07.20.01.00, which I cannot seem to find a file for on LSI's website at the moment.

To get the RES2SV240 expander connected, I can only use a single 4x SAS connection from the 4-7 ports on the controller, otherwise it kernel panics and takes a dump. If I either connect the 0-3 SFF-8087 to the controller on it's own, or connect both to the expander, I get the same kernel panic or BIOS hang as before. Just for a sanity check I tested this both with the backplane i2c cable connected and disconnected to the motherboard, that made no difference, so I'm suspecting something with PHY configuration to be the culprit, and of course performing some comparisons now.

I'm testing with a simple Rocky Linux 8.8 release and kernel 4.18.0 (-477.27.1) with the mpt3sas driver (I've had issues with mpt3sas before and SAS2008 controllers on fedora), and I get a warning that the 2308 that my card is now impersonating is 'deprecated hardware', sure, whatever, it's working so far for my testing.

Here is a quick output from lspci, lsscsi, sas2flash -list and lsiutil (option 16) to show the current config that doesn't dump out with the expander connected:

lspci -s 0000:06:00.0 -nnv

06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Physical Slot: 2-1
        Flags: bus master, fast devsel, latency 0, IRQ 60, NUMA node 0
        I/O ports at 2000 [size=256]
        Memory at d1240000 (64-bit, non-prefetchable) [size=64K]
        Memory at d1200000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at d1100000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
        Capabilities: [68] Express Endpoint, MSI 00
        Capabilities: [d0] Vital Product Data
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [1e0] Secondary PCI Express
        Capabilities: [1c0] Power Budgeting <?>
        Capabilities: [190] Dynamic Power Allocation <?>
        Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas
lsscsi -g

[0:0:0:0]    disk    Kingston DataTraveler G3  PMAP  /dev/sda   /dev/sg0
[1:0:0:0]    disk    ATA      SAMSUNG HD161HJ  0-20  /dev/sdc   /dev/sg2
[1:0:1:0]    enclosu Intel    RES2SV240        0d00  -          /dev/sg3
[2:0:0:0]    disk    ATA      KINGSTON SA400S3 1103  /dev/sdb   /dev/sg1
sas2flash -c 0 -list

LSI Corporation SAS2 Flash Utility
Version 13.00.00.00 (2012.02.17)
Copyright (c) 2008-2012 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2308_2(D1)

        Controller Number              : 0
        Controller                     : SAS2308_2(D1)
        PCI Address                    : 00:06:00:00
        SAS Address                    : 5001e67-a-22f7-c000
        NVDATA Version (Default)       : 0d.44.00.01
        NVDATA Version (Persistent)    : 0d.44.00.01
        Firmware Product ID            : 0x2214
        Firmware Version               : 13.00.66.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9207-8i
        BIOS Version                   : 07.25.00.00
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9207-8i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.
lsiutil

LSI Logic MPT Configuration Utility, Version 1.71, Sep 18, 2013
modprobe: FATAL: Module mptctl not found in directory /lib/modules/4.18.0-477.27.1.el8_8.x86_64
/bin/mknod: /dev/mptctl: File exists

1 MPT Port found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  ioc0              LSI Logic SAS2308 D1      200      0d004200     0

Select a device:  [1-1 or 0 to quit] 1

 1.  Identify firmware, BIOS, and/or FCode
 2.  Download firmware (update the FLASH)
 4.  Download/erase BIOS and/or FCode (update the FLASH)
 8.  Scan for devices
 801.  Scan for 1 LUN
 810.  Scan for 10 LUN's
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
23.  Reset target
42.  Display operating system names for devices
43.  Diagnostic Buffer actions
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode
 w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS2308's links are down, down, down, down, 6.0 G, 6.0 G, 6.0 G, 6.0 G

 B___T     SASAddress     PhyNum  Handle  Parent  Type
        5001e67a22f7c000           0001           SAS Initiator
        5001e67a22f7c001           0002           SAS Initiator
        5001e67a22f7c002           0003           SAS Initiator
        5001e67a22f7c003           0004           SAS Initiator
        5001e67a22f7c004           0005           SAS Initiator
        5001e67a22f7c005           0006           SAS Initiator
        5001e67a22f7c006           0007           SAS Initiator
        5001e67a22f7c007           0008           SAS Initiator
        5001e67957b5bfff     4     0009    0001   Edge Expander
 0   0  5001e67957b5bfe9     9     000a    0009   SATA Target
 0   1  5001e67957b5bffd    24     000b    0009   SAS Initiator and Target

Type      NumPhys    PhyNum  Handle     PhyNum  Handle  Port  Speed
Adapter      8          4     0001  -->   20     0009     0    6.0
                        5     0001  -->   21     0009     0    6.0
                        6     0001  -->   22     0009     0    6.0
                        7     0001  -->   23     0009     0    6.0

Expander    26          9     0009  -->    0     000a     0    3.0
                       20     0009  -->    4     0001     0    6.0
                       21     0009  -->    5     0001     0    6.0
                       22     0009  -->    6     0001     0    6.0
                       23     0009  -->    7     0001     0    6.0
                       24     0009  -->    0     000b     0    6.0

Enclosure Handle   Slots       SASAddress       B___T (SEP)
           0001      8      5001e67a22f7c000
           0002     25      5001e67957b5bfff    0   1

While that mptctl module error bugs me. I am still moving forward with this. I may just bite the bullet and put a CentOS 7 install on the SSD connected to the ICH SATA just to be sure I'm running a supported environment.

Now, to dig into those PHY configs and see what the heck is going on here...

oddballracing commented 10 months ago

Alright, now the plot thickens. I have identified the problem port. It is port/link 01 (second port). If I disable this port in lsiutil, I can get 6 links to the expander (2-7), which is an improvement over 4, but not quite the full 8. I am guessing that the ports must link in a minimum of pairs , and that is why port/link 0 also does not connect. I disabled it as well shortly after just to keep my config and resultant SAS topology consistent.

When I said the plot thickens, here's why:

While messing around with connections to establish a functional workaround that would still make use of all 8 links on the adapter, I connected port 0-3 to the first SFF-8087 (0-3) on the 12-port backplane, while adapter 4-7 was still connected to the expander. the expander remains connected to the second (4-7) and third SFF-8087 (8-11) on the backplane. I figured that meh, while I was at it I should test a few backplane locations just to make sure nothing screwy was going on, given this is a non-standard connection setup. On the backplane I went one by one with my old Samsung 160GB SATA drive. Slot 0, no problem. Slot 3, no problem. Slot 6, no problem. Slot 9, no problem. Slot 1... KERNEL PANIC, NMI!!! WTF, this wasn't even going through the expander? This whole time I am focused on PHY link aggregation being the root cause, but why would a single 1:1 connection take it down? Now I am very confused. Could this be an offset in NVRAM data not handling the PHY/LINK config? why port 1, and not 0, or 7. surely the config is a contiguous block of bytes in NVRAM? Am I fighting a hardware issue with this specific card? that doesn't make much sense, I've seen reports of the same behavior from other users after crossflashing. Did LSI layout the config in a different order than 0-7? if there is an offset, how come I can disable these ports successfully with lsiutil (the disablement is being written to NVRAM somehow, as it is persisting power off and initialization of the card).

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 13

SATA Maximum Queue Depth:  [0 to 255, default is 32]
SAS Max Queue Depth, Narrow:  [0 to 65535, default is 0]
SAS Max Queue Depth, Wide:  [0 to 65535, default is 0]
Device Missing Report Delay:  [0 to 2047, default is 0]
Device Missing I/O Delay:  [0 to 255, default is 0]

PhyNum  Link      MinRate  MaxRate  Initiator  Target    Port
   0    Disabled    1.5      6.0    Enabled    Disabled  Auto
   1    Disabled    1.5      6.0    Enabled    Disabled  Auto
   2    Enabled     1.5      6.0    Enabled    Disabled  Auto
   3    Enabled     1.5      6.0    Enabled    Disabled  Auto
   4    Enabled     1.5      6.0    Enabled    Disabled  Auto
   5    Enabled     1.5      6.0    Enabled    Disabled  Auto
   6    Enabled     1.5      6.0    Enabled    Disabled  Auto
   7    Enabled     1.5      6.0    Enabled    Disabled  Auto
Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS2308's links are off, off, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G

 B___T     SASAddress     PhyNum  Handle  Parent  Type
        5001e67a22f7c000           0001           SAS Initiator
        5001e67a22f7c001           0002           SAS Initiator
        5001e67a22f7c002           0003           SAS Initiator
        5001e67a22f7c003           0004           SAS Initiator
        5001e67a22f7c004           0005           SAS Initiator
        5001e67a22f7c005           0006           SAS Initiator
        5001e67a22f7c006           0007           SAS Initiator
        5001e67a22f7c007           0008           SAS Initiator
        5001e67957b5bfff     2     0009    0001   Edge Expander
 0   0  5001e67957b5bfe1     1     000a    0009   SATA Target
 0   1  5001e67957b5bffd    24     000b    0009   SAS Initiator and Target

Type      NumPhys    PhyNum  Handle     PhyNum  Handle  Port  Speed
Adapter      8          2     0001  -->   18     0009     0    6.0
                        3     0001  -->   19     0009     0    6.0
                        4     0001  -->   20     0009     0    6.0
                        5     0001  -->   21     0009     0    6.0
                        6     0001  -->   22     0009     0    6.0
                        7     0001  -->   23     0009     0    6.0

Expander    26          1     0009  -->    0     000a     0    3.0
                       18     0009  -->    2     0001     0    6.0
                       19     0009  -->    3     0001     0    6.0
                       20     0009  -->    4     0001     0    6.0
                       21     0009  -->    5     0001     0    6.0
                       22     0009  -->    6     0001     0    6.0
                       23     0009  -->    7     0001     0    6.0
                       24     0009  -->    0     000b     0    6.0

Enclosure Handle   Slots       SASAddress       B___T (SEP)
           0001      8      5001e67a22f7c000
           0002     25      5001e67957b5bfff    0   1

So many questions, so little sleep that I am going to get before work tomorrow.

oddballracing commented 10 months ago

scratch that pairs idea. just set phy 0 (handle 0001) online and now I have 7 links:

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS2308's links are 6.0 G, off, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G

 B___T     SASAddress     PhyNum  Handle  Parent  Type
        5001e67a22f7c000           0001           SAS Initiator
        5001e67a22f7c001           0002           SAS Initiator
        5001e67a22f7c002           0003           SAS Initiator
        5001e67a22f7c003           0004           SAS Initiator
        5001e67a22f7c004           0005           SAS Initiator
        5001e67a22f7c005           0006           SAS Initiator
        5001e67a22f7c006           0007           SAS Initiator
        5001e67a22f7c007           0008           SAS Initiator
        5001e67957b5bfff     0     0009    0001   Edge Expander
 0   0  5001e67957b5bfe1     1     000a    0009   SATA Target
 0   1  5001e67957b5bffd    24     000b    0009   SAS Initiator and Target

Type      NumPhys    PhyNum  Handle     PhyNum  Handle  Port  Speed
Adapter      8          0     0001  -->   16     0009     0    6.0
                        2     0001  -->   18     0009     0    6.0
                        3     0001  -->   19     0009     0    6.0
                        4     0001  -->   20     0009     0    6.0
                        5     0001  -->   21     0009     0    6.0
                        6     0001  -->   22     0009     0    6.0
                        7     0001  -->   23     0009     0    6.0

Expander    26          1     0009  -->    0     000a     0    3.0
                       16     0009  -->    0     0001     0    6.0
                       18     0009  -->    2     0001     0    6.0
                       19     0009  -->    3     0001     0    6.0
                       20     0009  -->    4     0001     0    6.0
                       21     0009  -->    5     0001     0    6.0
                       22     0009  -->    6     0001     0    6.0
                       23     0009  -->    7     0001     0    6.0
                       24     0009  -->    0     000b     0    6.0

Enclosure Handle   Slots       SASAddress       B___T (SEP)
           0001      8      5001e67a22f7c000
           0002     25      5001e67957b5bfff    0   1
oddballracing commented 10 months ago

OK, update firmware, BIOS and UEFI BSD to P20 and I don't understand what I'm looking at here:

It hasn't crashed, but then again, I haven't rebooted anything yet ( I did after updating the firmware/BIOS/UEFI), but not since enabling PHY 1 (HANDLE 0002).

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 13

SATA Maximum Queue Depth:  [0 to 255, default is 32]
SAS Max Queue Depth, Narrow:  [0 to 65535, default is 0]
SAS Max Queue Depth, Wide:  [0 to 65535, default is 0]
Device Missing Report Delay:  [0 to 2047, default is 0]
Device Missing I/O Delay:  [0 to 255, default is 0]

PhyNum  Link      MinRate  MaxRate  Initiator  Target    Port
   0    Enabled     1.5      6.0    Enabled    Disabled  Auto
   1    Enabled     1.5      6.0    Enabled    Disabled  Auto
   2    Enabled     1.5      6.0    Enabled    Disabled  Auto
   3    Enabled     1.5      6.0    Enabled    Disabled  Auto
   4    Enabled     1.5      6.0    Enabled    Disabled  Auto
   5    Enabled     1.5      6.0    Enabled    Disabled  Auto
   6    Enabled     1.5      6.0    Enabled    Disabled  Auto
   7    Enabled     1.5      6.0    Enabled    Disabled  Auto

Select a Phy:  [0-7, 8=AllPhys, RETURN to quit]
Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS2308's links are 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G, 6.0 G

 B___T     SASAddress     PhyNum  Handle  Parent  Type
        5001e67a22f7c000           0001           SAS Initiator
        5001e67a22f7c001           0002           SAS Initiator
        5001e67a22f7c002           0003           SAS Initiator
        5001e67a22f7c003           0004           SAS Initiator
        5001e67a22f7c004           0005           SAS Initiator
        5001e67a22f7c005           0006           SAS Initiator
        5001e67a22f7c006           0007           SAS Initiator
        5001e67a22f7c007           0008           SAS Initiator
        5001e67957b5bfff     0     0009    0001   Edge Expander
 0   0  5001e67957b5bfe0     0     000a    0009   SATA Target
 0   1  5001e67957b5bffd    24     000b    0009   SAS Initiator and Target

Type      NumPhys    PhyNum  Handle     PhyNum  Handle  Port  Speed
Adapter      8          0     0001  -->   16     0009     0    6.0
                        1     0001  -->   17     0009     0    6.0
                        2     0001  -->   18     0009     0    6.0
                        3     0001  -->   19     0009     0    6.0
                        4     0001  -->   20     0009     0    6.0
                        5     0001  -->   21     0009     0    6.0
                        6     0001  -->   22     0009     0    6.0
                        7     0001  -->   23     0009     0    6.0

Expander    26          0     0009  -->    0     000a     0    3.0
                       16     0009  -->    0     0001     0    6.0
                       17     0009  -->    1     0001     0    6.0
                       18     0009  -->    2     0001     0    6.0
                       19     0009  -->    3     0001     0    6.0
                       20     0009  -->    4     0001     0    6.0
                       21     0009  -->    5     0001     0    6.0
                       22     0009  -->    6     0001     0    6.0
                       23     0009  -->    7     0001     0    6.0
                       24     0009  -->    0     000b     0    6.0

Enclosure Handle   Slots       SASAddress       B___T (SEP)
           0001      8      5001e67a22f7c000
           0002     25      5001e67957b5bfff    0   1
sas2flash -c 0 -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2308_2(D1)

        Controller Number              : 0
        Controller                     : SAS2308_2(D1)
        PCI Address                    : 00:06:00:00
        SAS Address                    : 5001e67-a-22f7-c000
        NVDATA Version (Default)       : 14.01.00.06
        NVDATA Version (Persistent)    : 14.01.00.06
        Firmware Product ID            : 0x2214 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9207-8i
        BIOS Version                   : 07.39.02.00
        UEFI BSD Version               : 07.27.01.01
        FCODE Version                  : N/A
        Board Name                     : SAS9207-8i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.

gonna see what happens when I try a reboot.

oddballracing commented 10 months ago

just a quick update. Attempting a shutdown with 8 links connected cause the same NMI kernel panic (as I had expected), this appears to be caused by the PCIe link dropping at the CPU root port. I'm analyzing some PCI(e) config space and comparing a few different things between the 2208 and 2308 modules to see if there is something there that stands out. I'm also taking a good look at the PERC SBRs that Fohdeesha has in the ISO to see if there is something I have missed. probably gonna spend a good bit of time going through these things, but will update when I find something. 7/8 is not bad as far as bandwidth (42Gb/s vs 48Gb/s), but there has to be a reason that the link is dropping when the second link is brought up in any capacity. I'm not sure what's so special about that port, so I need to investigate a few more things before I proceed. While I do consider it progress in figuring out WHY the expander issue was/is happening, I want to get to the root of this to really explain why it happens at all, and if/how this card ca neb truly crossflashed and made stable in that scenario. I'm wishing i had a H710P in a R720 right now so I could do some comparisons between the two implementations.

oddballracing commented 10 months ago

On a physical hardware note (in case anyone else has the same curiosity with these "proprietary" connectors as I do), having dealt with many SuperMicro blade boards and these Intel servers:

This "SIOM" connector for the SAS modules appears to be an "Archer 0.8" - 0.80mm pitch mezzanine connector. The pinout is published in both the RMS25(JB/KB)0(80/40) and RMS25(CB/PB)0(80/40) hardware user guides (still available on Intel's web site, at least for now).

The part numbers that I believe these connectors to be are the 80-contact variants of the Archer 0.8, currently under the brand name Harwin. My conclusion is based upon the dimensions I have observed, and the physical appearance of these connectors.

Female (SAS Module): M58-2800842R Male (Server Board): M58-3800842R

This may interest anyone attempting to build a standardized PCIe adapter board for these modules, something I am not entirely opposed to doing myself at this point. the links for the user guides are below, but i have attached the 3 pages showing the connector and pinout (they are the same between the two documents, although Intel appears to have included the dimensions of the 40 pin variant, not the 80 pin as is actually installed.

RMS25xB080_Pinout.pdf The pinout looks pretty normal for PCIe signals up until #50 (8 lanes of the typical GND and TX/RX pairs), but then it gets a little interesting. I need to compare the signals with a standard x8 card-edge layout, but it looks like some proprietary signals for the BMC are in there as well, but 52 and 54 confuse me: "rSASm REFCLK (+/-)" is this thing using an external clock source for the SAS signaling?

RMS25JB080 (2308)

https://www.intel.com/content/dam/support/us/en/documents/server-products/raid-products/G42520-003_RMS25KB080_RMS25KB040_RMS25JB080_RMS25JB040_HWUG.pdf

RMS25CB080 (2208) https://www.intel.com/content/dam/support/us/en/documents/motherboards/server/sb/g37519003_rms25pb080_rms25pb040_rms25cb080_rms25cb.pdf

oddballracing commented 9 months ago

I'm still working on this, just haven't made any significant discoveries since finding the port that appears to be causing the failure. Currently looking into various differences between 1608E, 1708, 2008, 2108, 2208 and 2308 cards i happen to have. Working on getting a better grasp of all of the LSI tools available for both HostRAID and MegaRAID controllers, and analyzing different firmware files for both and Flash dumps where possible.

oddballracing commented 9 months ago

...Waiting on replacement SOIC-8 test clips as I wore out the one I was using for reading the SBR EEPROMS on various generations and revisions of LSI cards at my disposal, Trying to further 'demystify' the SBR section and any other data stored on these EEPROMS. The structures/layout I've seen so far is different between the 2008/2108 cards and the 2208/2308-based Intel modules (and Dell SBRs that Fohdeesha has in the ISO). Hopefully I can shed some light on what all differs as soon as the new test clips arrive.

Adefx commented 9 months ago

I love to read your effort breaking this intel no sense ! It's like a low level hardware odyssey. Good wind my friend !

Le sam. 2 déc. 2023, 23:13, oddballracing @.***> a écrit :

...Waiting on replacement SOIC-8 test clips as I wore out the one I was using for reading the SBR EEPROMS on various generations and revisions of LSI cards at my disposal, Trying to further 'demystify' the SBR section and any other data stored on these EEPROMS. The structures/layout I've seen so far is different between the 2008/2108 cards and the 2208/2308-based Intel modules (and Dell SBRs that Fohdeesha has in the ISO). Hopefully I can shed some light on what all differs as soon as the new test clips arrive.

— Reply to this email directly, view it on GitHub https://github.com/Fohdeesha/lab-docu/issues/27#issuecomment-1837266631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3BDGN2532VP3F7QRPS4IDYHORZLAVCNFSM5OCRKN22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTG4ZDMNRWGMYQ . You are receiving this because you authored the thread.Message ID: @.***>