geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.61k stars 145 forks source link

Test LSI SAS9207 (SAS2308 IT mode FW) SAS-2 Controller #140

Open Doridian opened 3 years ago

Doridian commented 3 years ago
root@raspberrypi:/home/doridian# lspci -v
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: c0000000-c01fffff
        Capabilities: [48] Power Management version 3
        Capabilities: [ac] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [180] Vendor Specific Information: ID=0000 Rev=0 Len=028 <?>
        Capabilities: [240] L1 PM Substates

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
        Subsystem: LSI Logic / Symbios Logic 9207-8e SAS2.1 HBA
        Flags: bus master, fast devsel, latency 0, IRQ 67
        I/O ports at <unassigned> [disabled]
        Memory at 600140000 (64-bit, non-prefetchable) [size=64K]
        Memory at 600100000 (64-bit, non-prefetchable) [size=256K]
        [virtual] Expansion ROM at 600000000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
        Capabilities: [68] Express Endpoint, MSI 00
        Capabilities: [d0] Vital Product Data
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [1e0] #19
        Capabilities: [1c0] Power Budgeting <?>
        Capabilities: [190] #16
        Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

root@raspberrypi:/home/doridian# dmesg | grep mpt
[    0.000000]   Normal   empty
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    5.230063] mpt3sas version 35.100.00.00 loaded
[    5.232068] mpt3sas 0000:01:00.0: enabling device (0000 -> 0002)
[    5.232134] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (3886104 kB)
[    5.298234] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    5.298309] mpt2sas_cm0: MSI-X vectors supported: 16
[    5.298340] mpt2sas_cm0:  0 4
[    5.301956] mpt2sas_cm0: High IOPs queues : disabled
[    5.301980] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 68
[    5.301992] mpt2sas0-msix1: PCI-MSI-X enabled: IRQ 69
[    5.302004] mpt2sas0-msix2: PCI-MSI-X enabled: IRQ 70
[    5.302014] mpt2sas0-msix3: PCI-MSI-X enabled: IRQ 71
[    5.302031] mpt2sas_cm0: iomem(0x0000000600140000), mapped(0x(____ptrval____)), size(65536)
[    5.302044] mpt2sas_cm0: ioport(0x0000000000000000), size(0)
[    5.364559] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[    5.397725] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), sge_per_chain(9), sge_per_io(128), chains_per_io(15)
[    5.421848] mpt2sas_cm0: request pool(0x(____ptrval____)) - dma(0x41f400000): depth(10368), frame_size(128), pool_size(1296 kB)
[    7.496061] mpt2sas_cm0: sense pool(0x(____ptrval____))- dma(0x41fe00000): depth(10107),element_size(96), pool_size(947 kB)
[    7.522164] mpt2sas_cm0: config page(0x(____ptrval____)) - dma(0x449004000): size(512)
[    7.522188] mpt2sas_cm0: Allocated physical memory: size(7454 kB)
[    7.522201] mpt2sas_cm0: Current Controller Queue Depth(10104),Max Controller Queue Depth(10240)
[    7.522213] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[   22.749654] mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(10236)
[   22.749670] mpt2sas_cm0: _config_request: command timeout
[   22.749682] mpt2sas_cm0: Command Timeout
[   22.749841] mpt2sas_cm0: sending diag reset !!
[   23.727134] mpt2sas_cm0: diag reset: SUCCESS
[   23.787889] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[   38.877713] mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(10236)
[   38.877731] mpt2sas_cm0: _config_request: command timeout
[   38.877743] mpt2sas_cm0: Command Timeout
[   38.877895] mpt2sas_cm0: _config_request: attempting retry (1)
[   53.981670] mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(10236)
[   53.981688] mpt2sas_cm0: _config_request: command timeout
[   53.981700] mpt2sas_cm0: Command Timeout
[   53.981852] mpt2sas_cm0: _config_request: attempting retry (2)
[   69.085653] mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(10236)
[   69.085669] mpt2sas_cm0: _config_request: command timeout
[   69.085681] mpt2sas_cm0: Command Timeout
[   84.189986] mpt2sas_cm0: config_request: manufacturing(11), action(0), form(0x00000000), smid(10236)
[   84.190002] mpt2sas_cm0: _config_request: command timeout
[   84.190014] mpt2sas_cm0: Command Timeout
[   84.190163] mpt2sas_cm0: _config_request: attempting retry (1)
[   99.294124] mpt2sas_cm0: config_request: manufacturing(11), action(0), form(0x00000000), smid(10236)
[   99.294140] mpt2sas_cm0: _config_request: command timeout
[   99.294151] mpt2sas_cm0: Command Timeout
[   99.294301] mpt2sas_cm0: _config_request: attempting retry (2)
[  114.398178] mpt2sas_cm0: config_request: manufacturing(11), action(0), form(0x00000000), smid(10236)
[  114.398195] mpt2sas_cm0: _config_request: command timeout
[  114.398207] mpt2sas_cm0: Command Timeout
[  114.398359] mpt2sas_cm0: overriding NVDATA EEDPTagMode setting

IMG_0039

Driver loads, no immediate kernel panics, heartbeat LED is blinking. However, those timeouts without any devices attached don't make me hopeful. Let's try plugging in some devices next...

Doridian commented 3 years ago

No dice at all, I get the following repeatedly, does not detect any drives or work at all.

[   22.749865] mpt2sas_cm0: sending diag reset !!
[   23.727989] mpt2sas_cm0: diag reset: SUCCESS
[   23.787230] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[   38.877862] mpt2sas_cm0: config_request: manufacturing(0), action(0), form(0x00000000), smid(10236)
[   38.877884] mpt2sas_cm0: _config_request: command timeout
[   38.877897] mpt2sas_cm0: Command Timeout
[   38.877907] mf:

[   38.877918] 04000000
[   38.877929] 00000000
[   38.877939] 00000000
[   38.877949] 00000000
[   38.877959] 00000000
[   38.877969] 09000000
[   38.877979] 00000000
[   38.877989] d3000000
[   38.877998]

[   38.878007] ffffffff
[   38.878017] ffffffff
[   38.878027] 00000000

[   38.878053] mpt2sas_cm0: _config_request: attempting retry (1)
xdays commented 3 years ago

@Doridian did you make any progress?

disdos10 commented 2 years ago

I can confirm this error is caused by the PCIe controller that only supports 32bit MMIO reads/writes. I compiled a custom kernel with the patch suggested here: https://github.com/raspberrypi/linux/issues/4158#issuecomment-783469773 and the error was gone.

paulwratt commented 2 years ago

sweet, good find

For anyone getting these posts by mail, and following the various Graphics Cards threads, it will affect them too (writeq and write 4 - 256 + bytes:

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index fd172c41df90..3731da41c680 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -42,7 +42,9 @@ static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
 #define __raw_writeq __raw_writeq
 static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
-       asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
+       //asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
+       asm volatile("str %w0, [%1]" : : "rZ" ((u32)val), "r" (addr));
+       asm volatile("str %w0, [%1]" : : "rZ" ((u32)(val>>32)), "r" (addr + 4));
 }

 #define __raw_readb __raw_readb
disdos10 commented 2 years ago

Sadly this code lacks locking of the writeq, so it fails very hard on IO. Results are data corruption and finally kernel Oops. So i followed this approach to activate the code in the driver instead:

--- a/drivers/scsi/mpt3sas/mpt3sas_base.c   2022-02-26 13:15:10.867598263 +0100
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c   2022-02-26 09:31:09.297463161 +0100
@@ -3798,7 +3798,7 @@
  * care of 32 bit environment where its not quarenteed to send the entire word
  * in one transfer.
  */
-#if defined(writeq) && defined(CONFIG_64BIT)
+#if defined(writeq) && defined(CONFIG_64BIT) && !defined(CONFIG_ARCH_BCM2835)
 static inline void
 _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
 {

Everything looks promising so far.