RemixVSL / iomemory-vsl4

Updated Fusion-io iomemory VSL4 Linux (version 4.3.7) driver for recent kernels.
55 stars 9 forks source link

fioinf ioMemory SX300-6400 0000:03:00.0: Voltage 'aux' spurious intr. #43

Closed hlepesant closed 2 years ago

hlepesant commented 2 years ago

Bug description

The fusioIO is used as data dir for a PostgreSQL database. The issue occure when trying to restore the database with PgBackrest.

How to reproduce

The card is mounter as XFS volume in /srv/postgresql. The restore commande is :

pgbackrest --stanza=mystanza --buffer-size=16m --process-max=16 restore

Kernel log :

May  5 15:44:34 -- kernel: [  851.258090] fioinf ioMemory SX300-6400 0000:03:00.0: Voltage 'aux' spurious intr.

The same card works fin on the same hardware on Debian 8 with original drivers, but firmware v8.9.8.

Possible solution

Unknown

Environment information

Information about the system the module is used on

  1. Linux kernel compiled against : 5.10.0-13-amd64
  2. The C compiler version used : Debian 10.2.1-6
  3. distribution, and version : Debian GNU/Linux 11 bullseye
  4. Tag or Branch of iomemory-vsl : vsl4 / v5.12.1
  5. FIO device used, if applicabv5.12.1
    • fio-status
      
      Found 1 VSL driver package:
      4.3.7 build 1205 Driver: loaded

Found 1 ioMemory device in this system

Adapter: ioMono (driver 4.3.7) ioMemory SX300-6400, Product Number:MM86C, SN:-- PCIe Power limit threshold: 24.75W Connected ioMemory modules: fct0: 03:00.0, Product Number:MM86C, SN:--

fct0 Attached ioMemory Adapter Controller, Product Number:MM86C, SN:-- PCI:03:00.0, Slot Number:6 Firmware v8.9.9, rev 20200113 Public 6400.00 GBytes device size Internal temperature: 36.91 degC, max 42.33 degC Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00% Contained Virtual Partitions: fioa: ID:0, UUID:--

fioa State: Online, Type: block device, Device: /dev/fioa ID:0, UUID:-- 6400.00 GBytes device size


   * lspci -b -nn

00:00.0 Host bridge [0600]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 [8086:2f00] (rev 02) 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 [8086:2f02] (rev 02) 00:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [8086:2f04] (rev 02) 00:02.2 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [8086:2f06] (rev 02) 00:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f08] (rev 02) 00:03.2 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f0a] (rev 02) 00:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management [8086:2f28] (rev 02) 00:05.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug [8086:2f29] (rev 02) 00:05.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors [8086:2f2a] (rev 02) 00:05.4 PIC [0800]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 I/O APIC [8086:2f2c] (rev 02) 00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR [8086:8d7c] (rev 05) 00:11.4 SATA controller [0106]: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] [8086:8d62] (rev 05) 00:16.0 Communication controller [0780]: Intel Corporation C610/X99 series chipset MEI Controller #1 [8086:8d3a] (rev 05) 00:16.1 Communication controller [0780]: Intel Corporation C610/X99 series chipset MEI Controller #2 [8086:8d3b] (rev 05) 00:1a.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 [8086:8d2d] (rev 05) 00:1c.0 PCI bridge [0604]: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 [8086:8d10] (rev d5) 00:1c.7 PCI bridge [0604]: Intel Corporation C610/X99 series chipset PCI Express Root Port #8 [8086:8d1e] (rev d5) 00:1d.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 [8086:8d26] (rev 05) 00:1f.0 ISA bridge [0601]: Intel Corporation C610/X99 series chipset LPC Controller [8086:8d44] (rev 05) 00:1f.2 SATA controller [0106]: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] [8086:8d02] (rev 05) 01:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet [14e4:168a] (rev 10) 01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet [14e4:168a] (rev 10) 01:00.2 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet [14e4:168a] (rev 10) 01:00.3 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57800 1/10 Gigabit Ethernet [14e4:168a] (rev 10) 02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02) 03:00.0 Mass storage controller [0180]: SanDisk ioMemory FHHL [1aed:3001] 07:00.0 PCI bridge [0604]: Renesas Technology Corp. SH7758 PCIe Switch [PS] [1912:001d] 08:00.0 PCI bridge [0604]: Renesas Technology Corp. SH7758 PCIe Switch [PS] [1912:001d] 09:00.0 PCI bridge [0604]: Renesas Technology Corp. SH7758 PCIe-PCI Bridge [PPB] [1912:001a] 0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01) 7f:08.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f80] (rev 02) 7f:08.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f32] (rev 02) 7f:08.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f83] (rev 02) 7f:09.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f90] (rev 02) 7f:09.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f33] (rev 02) 7f:09.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f93] (rev 02) 7f:0b.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f81] (rev 02) 7f:0b.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f36] (rev 02) 7f:0b.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f37] (rev 02) 7f:0c.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe0] (rev 02) 7f:0c.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe1] (rev 02) 7f:0c.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe2] (rev 02) 7f:0c.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe3] (rev 02) 7f:0c.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe4] (rev 02) 7f:0c.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe5] (rev 02) 7f:0c.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe6] (rev 02) 7f:0c.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe7] (rev 02) 7f:0f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff8] (rev 02) 7f:0f.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff9] (rev 02) 7f:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffc] (rev 02) 7f:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffd] (rev 02) 7f:0f.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffe] (rev 02) 7f:10.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f1d] (rev 02) 7f:10.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f34] (rev 02) 7f:10.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1e] (rev 02) 7f:10.6 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f7d] (rev 02) 7f:10.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1f] (rev 02) 7f:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2fa0] (rev 02) 7f:12.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2f30] (rev 02) 7f:12.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 Debug [8086:2f70] (rev 02) 7f:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02) 7f:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02) 7f:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2faa] (rev 02) 7f:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fab] (rev 02) 7f:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fac] (rev 02) 7f:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fad] (rev 02) 7f:13.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast [8086:2fae] (rev 02) 7f:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2faf] (rev 02) 7f:14.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control [8086:2fb0] (rev 02) 7f:14.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control [8086:2fb1] (rev 02) 7f:14.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers [8086:2fb2] (rev 02) 7f:14.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers [8086:2fb3] (rev 02) 7f:14.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbc] (rev 02) 7f:14.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbd] (rev 02) 7f:14.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbe] (rev 02) 7f:14.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbf] (rev 02) 7f:15.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control [8086:2fb4] (rev 02) 7f:15.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control [8086:2fb5] (rev 02) 7f:15.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers [8086:2fb6] (rev 02) 7f:15.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers [8086:2fb7] (rev 02) 7f:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers [8086... (rev 02) 7f:16.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast [8086:2f6e] (rev 02) 7f:16.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2f6f] (rev 02) 7f:17.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control [8086:2fd0] (rev 02) 7f:17.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb8] (rev 02) 7f:17.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb9] (rev 02) 7f:17.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fba] (rev 02) 7f:17.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fbb] (rev 02) 7f:1e.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f98] (rev 02) 7f:1e.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f99] (rev 02) 7f:1e.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9a] (rev 02) 7f:1e.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0] (rev 02) 7f:1e.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9c] (rev 02) 7f:1f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f88] (rev 02) 7f:1f.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f8a] (rev 02) 80:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 [8086:2f02] (rev 02) 80:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [8086:2f04] (rev 02) 80:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f08] (rev 02) 80:03.2 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f0a] (rev 02) 80:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management [8086:2f28] (rev 02) 80:05.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug [8086:2f29] (rev 02) 80:05.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors [8086:2f2a] (rev 02) 80:05.4 PIC [0800]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 I/O APIC [8086:2f2c] (rev 02) 81:00.0 Ethernet controller [0200]: Intel Corporation Ethernet 10G 2P X520 Adapter [8086:154d] (rev 01) 81:00.1 Ethernet controller [0200]: Intel Corporation Ethernet 10G 2P X520 Adapter [8086:154d] (rev 01) ff:08.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f80] (rev 02) ff:08.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f32] (rev 02) ff:08.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 0 [8086:2f83] (rev 02) ff:09.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f90] (rev 02) ff:09.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f33] (rev 02) ff:09.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 QPI Link 1 [8086:2f93] (rev 02) ff:0b.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f81] (rev 02) ff:0b.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f36] (rev 02) ff:0b.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f37] (rev 02) ff:0c.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe0] (rev 02) ff:0c.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe1] (rev 02) ff:0c.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe2] (rev 02) ff:0c.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe3] (rev 02) ff:0c.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe4] (rev 02) ff:0c.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe5] (rev 02) ff:0c.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe6] (rev 02) ff:0c.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe7] (rev 02) ff:0f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff8] (rev 02) ff:0f.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff9] (rev 02) ff:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffc] (rev 02) ff:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffd] (rev 02) ff:0f.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffe] (rev 02) ff:10.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f1d] (rev 02) ff:10.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f34] (rev 02) ff:10.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1e] (rev 02) ff:10.6 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f7d] (rev 02) ff:10.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1f] (rev 02) ff:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2fa0] (rev 02) ff:12.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2f30] (rev 02) ff:12.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 Debug [8086:2f70] (rev 02) ff:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02) ff:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02) ff:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2faa] (rev 02) ff:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fab] (rev 02) ff:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fac] (rev 02) ff:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fad] (rev 02) ff:13.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast [8086:2fae] (rev 02) ff:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2faf] (rev 02) ff:14.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control [8086:2fb0] (rev 02) ff:14.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control [8086:2fb1] (rev 02) ff:14.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers [8086:2fb2] (rev 02) ff:14.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers [8086:2fb3] (rev 02) ff:14.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbc] (rev 02) ff:14.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbd] (rev 02) ff:14.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbe] (rev 02) ff:14.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbf] (rev 02) ff:15.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control [8086:2fb4] (rev 02) ff:15.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control [8086:2fb5] (rev 02) ff:15.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers [8086:2fb6] (rev 02) ff:15.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers [8086:2fb7] (rev 02) ff:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers [8086... (rev 02) ff:16.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast [8086:2f6e] (rev 02) ff:16.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2f6f] (rev 02) ff:17.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control [8086:2fd0] (rev 02) ff:17.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb8] (rev 02) ff:17.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb9] (rev 02) ff:17.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fba] (rev 02) ff:17.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fbb] (rev 02) ff:1e.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f98] (rev 02) ff:1e.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f99] (rev 02) ff:1e.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9a] (rev 02) ff:1e.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0] (rev 02) ff:1e.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9c] (rev 02) ff:1f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f88] (rev 02) ff:1f.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f8a] (rev 02)

snuf commented 2 years ago

@hlepesant can you add a full fio-status -a and is there more dmesg output besides that single line?

The dmesg msg you posted points at the driver detecting a voltage irregularity, but upon further investigation decides to discard it the irregularity it noticed as it seemed invalid, which should not shutdown the device. However if there are other issues it will shutdown the device as a precaution.

On a sidenote, please try to use main as the branch to follow only if there is an issue with main revert to a targeted branch.

hlepesant commented 2 years ago

Hi, here the fio-status -a : `

fio-status -a

Found 1 VSL driver package: 4.3.7 build 1205 Driver: loaded

Found 1 ioMemory device in this system

Adapter: ioMono (driver 4.3.7) ioMemory SX300-6400, Product Number:MM86C, SN:-- ioMemory Adapter Controller, PN:1PXNH Product UUID:-- PCIe Bus voltage: avg 12.18V PCIe Bus current: avg 0.70A PCIe Bus power: avg 8.50W PCIe Power limit threshold: 24.75W PCIe slot available power: 25.00W PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total Connected ioMemory modules: fct0: 03:00.0, Product Number:MM86C, SN:--

fct0 Attached ioMemory Adapter Controller, Product Number:MM86C, SN:-- ioMemory Adapter Controller, PN:1PXNH Microcode Versions: App:0.0.30.0 Powerloss protection: protected Last Power Monitor Incident: 22325 sec PCI:03:00.0, Slot Number:6 Vendor:1aed, Device:3001, Sub vendor:1028, Sub device:1fa1 Firmware v8.9.9, rev 20200113 Public 6400.00 GBytes device size Format: v501, 1562500000 sectors of 4096 bytes PCIe slot available power: 25.00W PCIe negotiated link: 8 lanes at 5.0 Gt/sec each, 4000.00 MBytes/sec total Internal temperature: 35.44 degC, max 43.31 degC Internal voltage: avg 1.01V, max 1.01V Aux voltage: avg 1.79V, max 1.84V Reserve space status: Healthy; Reserves: 100.00%, warn at 10.00% Active media: 100.00% Rated PBW: 22.00 PB, -63.96% remaining Lifetime data volumes: Physical bytes written: 36,071,030,143,776,528 Physical bytes read : 42,418,426,616,748,000 RAM usage: Current: 1,141,486,720 bytes Peak : 1,147,818,240 bytes Contained Virtual Partitions: fioa: ID:0, UUID:02eba185-3409-4290-bab9-d92238864df5

fioa State: Online, Type: block device, Device: /dev/fioa ID:0, UUID:02eba185-3409-4290-bab9-d92238864df5 6400.00 GBytes device size Format: 1562500000 sectors of 4096 bytes Sectors In Use: 1562500000 Max Physical Sectors Allowed: 1562500000 Min Physical Sectors Reserved: 1562500000 `

The dmes is : `

dmesg |grep fioinf

[ 1.505589] fioinf No Queue strategy is set. [ 1.505590] fioinf [ 1.505590] fioinf Copyright (c) 2006-2020 Western Digital Corporation or its affiliates. [ 1.505591] fioinf For Terms and Conditions see the License file included [ 1.505591] fioinf with this driver package. [ 1.505592] fioinf [ 1.505593] fioinf ioDrive driver 5.10.0-13-def3093-4.3.7.1205 loading... [ 1.505593] fioinf VSL configuration hash: f7771666cf9469344233dd611c787828ea46645e [ 1.505993] fioinf ioDrive 0000:03:00.0: mapping controller on BAR 5 [ 1.654838] fioinf ioDrive 0000:03:00.0: Configuring controller. [ 1.655027] fioinf ioDrive 0000:03:00.0: using MSI-X interrupts [ 1.655446] fioinf ioDrive 0000:03:00.0: Starting controller [ 2.250026] fioinf ioDrive 0000:03:00.0: PMP Address: 1 1 1 [ 5.434031] fioinf ioDrive 0000:03:00.0: SMP fpga Microcode BOOT version 0.8.0 1 [ 5.434034] fioinf ioDrive 0000:03:00.0: SMP controller Microcode APP version 0.30.0 0 [ 7.362051] fioinf ioDrive 0000:03:00.0: Product UUID is f96cde24-4f9a-51bb-af54-8e8b392ab802 [ 7.362057] fioinf ioDrive 0000:03:00.0: Required PCIE bandwidth 4.000 GBytes per sec [ 7.362058] fioinf ioDrive 0000:03:00.0: Board serial number is 1441G0981 [ 7.362060] fioinf ioDrive 0000:03:00.0: Adapter serial number is 1441G0981 [ 7.362067] fioinf ioDrive 0000:03:00.0: Default capacity 6400.000 GBytes [ 7.362068] fioinf ioDrive 0000:03:00.0: Default sector size 4096 bytes [ 7.362071] fioinf ioDrive 0000:03:00.0: Rated endurance 22.00 PBytes [ 7.362072] fioinf ioDrive 0000:03:00.0: 85C temp range hardware found [ 7.362075] fioinf ioDrive 0000:03:00.0: Maximum capacity 6400.000 GBytes [ 7.362101] fioinf ioDrive 0000:03:00.0: PCIe Slot reported power limit: 25000mWatts [ 7.362103] fioinf ioDrive 0000:03:00.0: PCIe Adapter power limit: 25000mWatts [ 7.362104] fioinf ioDrive 0000:03:00.0: PCIe Adapter power Throttle point: 24750mWatts [ 7.362110] fioinf ioDrive 0000:03:00.0: Firmware Archive Information: [ 7.362112] fioinf ioDrive 0000:03:00.0: Name = ioMemory Firmware 8.9.9.20200113 [ 7.362113] fioinf ioDrive 0000:03:00.0: Description = ioMemory Firmware 8.9.9.20200113 Archive [ 7.362115] fioinf ioDrive 0000:03:00.0: Version = 8.9.9.0 [ 7.362118] fioinf ioDrive 0000:03:00.0: Date = 01/13/2020 Rev 20200113 [ 7.362121] fioinf ioDrive 0000:03:00.0: Firmware version 8.9.9 118194 (0x802409 0x1cdb2) [ 7.362123] fioinf ioDrive 0000:03:00.0: Platform version 40 [ 7.362125] fioinf ioDrive 0000:03:00.0: Firmware VCS version 118194 [0x1cdb2] [ 7.362131] fioinf ioDrive 0000:03:00.0: Firmware VCS uid 0xea8892fee031ff5085ff47b5d3b4717e7dd4b603 [ 7.375052] fioinf ioDrive 0000:03:00.0: Powercut flush: Supported and Enabled. [ 7.566855] fioinf ioDrive 0000:03:00.0: Loading microcode image 1.0.9 rev 100764 with flags 0x00010100 [ 7.566897] fioinf ioDrive 0000:03:00.0: Microcode loaded (0). [ 7.644601] fioinf ioDrive 0000:03:00.0: Multiple queues enabled [ 7.676454] fioinf ioDrive 0000:03:00.0: PCIe power monitor enabled (master). Limit set to 24.750 watts. [ 7.676456] fioinf ioDrive 0000:03:00.0: Thermal monitoring: Enabled [ 7.676458] fioinf ioDrive 0000:03:00.0: Hardware temperature alarm set for 85C. [ 7.843543] fioinf ioDrive 0000:03:00.0: Starting device ioMemory SX300-6400 0000:03:00.0 [ 10.140722] fioinf ioMemory SX300-6400 0000:03:00.0: probed fct0 [ 10.605126] fioinf ioMemory SX300-6400 0000:03:00.0: sector_size=4096 [ 10.607587] fioinf ioMemory SX300-6400 0000:03:00.0: Setting channel range data to [2 .. 2047] [ 11.556844] fioinf ioMemory SX300-6400 0000:03:00.0: Found metadata in EBs 1362-1889, loading... [ 13.253538] fioinf ioMemory SX300-6400 0000:03:00.0: Device has no Virtual Partitions. Creating compatibility Virtual Partition fioa. [ 13.253564] fioinf ioMemory SX300-6400 0000:03:00.0: Created device of size 6400000000000 bytes with 1562500000 sectors of 4096 bytes (1562500000 mapped). [ 13.262079] fioinf ioMemory SX300-6400 0000:03:00.0: Creating block device fioa: major: 254 minor: 0 sector size: 4096... [ 13.262590] fioinf ioMemory SX300-6400 0000:03:00.0: Exposing Virtual Partition fioa of size 6400000000000 bytes with 1562500000 sectors of 4096 bytes (1562500000 mapped). [ 13.262591] fioinf ioMemory SX300-6400 0000:03:00.0: Attach succeeded. [ 24.449722] fioinf ioMemory SX300-6400 0000:03:00.0: Voltage 'aux' spurious intr. ` I don't know if the voltage message is the root cause of the problem. But when it occures the server is frozen, and need a hard reboot.

regards, Hugues

hlepesant commented 2 years ago

Hi,

I've reinstalled a Debian 9 (kernel 4.9.0-18-amd64) and installed the iomemory-vsl4 (v4.20.1). The module is running. I stil have some lines in the logs with : fioinf ioMemory SX300-6400 0000:03:00.0: Voltage 'aux' spurious intr. But the system do not froze. I'm restoring a PostgreSQL backup on the ioMemory SX300-6400 as I'm writting this lines. Hope it'll end nicely. 🤞 .

snuf commented 2 years ago

Thanks for the update!

Am curious if you have any fioerr messages in the logs or something on the console when the box freezes (perhaps a remote loghosts)? I've had issues in the past with not seeing the last messages in logs as the kernel fails to write/flush the log but is able to send it over the wire or to the console (not fio related but generally).

I don't think it's the module per-se, that said it seems like the sysmon part of the drive is complaining about something through the driver, which could stem from other issues that we're not seeing atm.

bplein commented 2 years ago

The aux voltage message could very well be pointing to very short voltage drops either from the motherboard or potentially on the card. They are likely a symptom and not a root cause. The root cause would be either a hiccup in power on the motherboard or possibly a failing component on the card that is dropping voltage.

hlepesant commented 2 years ago

Hi, I definitely do not think the voltage message has a link to the system getting frozen. I think more on the huge io needed by the backup recovery. The switch to Debian 9 as exlain above works fine.

I'll close this issue. Regards