chaos4ever / chaos

The chaos Operating System
https://chaos4ever.github.io/
15 stars 6 forks source link

[servers/pci] Investigate PCI probing code crash on 00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21) #134

Open perlun opened 5 years ago

perlun commented 5 years ago

I noticed recently that with a very modern Dell Vostro laptop we have at work, chaos crashes completely on startup, which is kind of interesting. I debugged this a bit and concluded that it's actually the pci server that crashes when setting up the SMBus device.

I disabled this device in eab58d65e1f1f899494f6541ca87273403a2d54c, since "not detected" is much better than "crashing the machine". Someone with a strong love for the PCI hardware (...) would be very welcome to dig in and do a proper fix for this. I can volunteer to test any fix you make on this hardware (I have only seen it manifested one one single PC, ever.)

Steps taken to find this

While I debugged this, I first tried with this patch which will ignore certain PCI devices in the scanning and setup.

(We could do like MINIX3 has done it (which was written after chaos had its peak years) and borrow the PCI scanning code from NetBSD instead of trying to write it on our own. Their implementation (the MINIX3 one, which is based on the NetBSD code) can be found here: https://github.com/Stichting-MINIX-Research-Foundation/minix/blame/master/sys/dev/pci/pci_subr.c)

diff --git a/servers/system/pci/pci.c b/servers/system/pci/pci.c
index 53c1de8..7fe5210 100644
--- a/servers/system/pci/pci.c
+++ b/servers/system/pci/pci.c
@@ -528,7 +528,7 @@ static pci_device_type *pci_scan_slot(pci_device_type *input_device)
     bool is_multi = FALSE;
     uint8_t header_type;

-    for (function = 0; function < 8; function++, input_device->device_function++)
+    for (function = 0; function < 4 /*8*/; function++, input_device->device_function++)
     {
         if (function != 0 && !is_multi)
         {

This is just a thought, but maybe it's wrong to assume that all PCI hosts supports 8 functions per device and this is causing the problem? It could be that there is a flag that we could read somehow, that determines how many functions that should be scanned per device, and by not honoring that flag, we use the hardware incorrectly which it doesn't like and crashes in our face. Just a thought but maybe worth investigating.

Finding the failing device

I continued the investigation and, interestingly enough, it seems to be an SMBus device that doesn't like the way we probe its PCI slot:

diff --git a/servers/system/pci/pci.c b/servers/system/pci/pci.c
index 53c1de8..f0cfbc9 100644
--- a/servers/system/pci/pci.c
+++ b/servers/system/pci/pci.c
@@ -535,6 +535,12 @@ static pci_device_type *pci_scan_slot(pci_device_type *input_device)
             continue;
         }

+        // Some specific device 4 causing issues...?
+        if (function == 4)
+        {
+            continue;
+        }
+
         header_type = pci_read_config_uint8_t(input_device, PCI_HEADER_TYPE);
         input_device->header_type = header_type & 0x7F;
         device = pci_scan_device(input_device);

The code above excludes this device/function from the scanning.

00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
    Subsystem: Dell Sunrise Point-LP SMBus
    Flags: medium devsel, IRQ 16
    Memory at df232000 (64-bit, non-prefetchable) [size=256]
    I/O ports at f040 [size=32]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c_i801

Does this SMBus device need to be probed in some special way or what's the deal here?

More details about the PCI subsystem on this machine

For reference, here is the full output of lspci:

$ lspci -v
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
    Subsystem: Dell Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
    Flags: bus master, fast devsel, latency 0
    Capabilities: <access denied>

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
    Subsystem: Dell UHD Graphics 620
    Flags: bus master, fast devsel, latency 0, IRQ 128
    Memory at de000000 (64-bit, non-prefetchable) [size=16M]
    Memory at c0000000 (64-bit, prefetchable) [size=256M]
    I/O ports at f000 [size=64]
    [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915

00:04.0 Signal processing controller: Intel Corporation Skylake Processor Thermal Subsystem (rev 08)
    Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
    Flags: fast devsel, IRQ 16
    Memory at df220000 (64-bit, non-prefetchable) [size=32K]
    Capabilities: <access denied>
    Kernel driver in use: proc_thermal
    Kernel modules: processor_thermal_device

00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) (prog-if 30 [XHCI])
    Subsystem: Dell Sunrise Point-LP USB 3.0 xHCI Controller
    Flags: bus master, medium devsel, latency 0, IRQ 124
    Memory at df210000 (64-bit, non-prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: xhci_hcd
    Kernel modules: xhci_pci

00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
    Subsystem: Dell Sunrise Point-LP Thermal subsystem
    Flags: fast devsel, IRQ 18
    Memory at df237000 (64-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: intel_pch_thermal
    Kernel modules: intel_pch_thermal

00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
    Subsystem: Dell Sunrise Point-LP Serial IO I2C Controller
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at df236000 (64-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci

00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
    Subsystem: Dell Sunrise Point-LP CSME HECI
    Flags: bus master, fast devsel, latency 0, IRQ 127
    Memory at df235000 (64-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: mei_me
    Kernel modules: mei_me

00:17.0 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 21)
    Subsystem: Dell 82801 Mobile SATA Controller [RAID mode]
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 125
    Memory at df230000 (32-bit, non-prefetchable) [size=8K]
    Memory at df234000 (32-bit, non-prefetchable) [size=256]
    I/O ports at f090 [size=8]
    I/O ports at f080 [size=4]
    I/O ports at f060 [size=32]
    Memory at df233000 (32-bit, non-prefetchable) [size=2K]
    Capabilities: <access denied>
    Kernel driver in use: ahci
    Kernel modules: ahci

00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 122
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: df100000-df1fffff
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0, IRQ 123
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    Memory behind bridge: df000000-df0fffff
    Capabilities: <access denied>
    Kernel driver in use: pcieport

00:1f.0 ISA bridge: Intel Corporation Intel(R) 100 Series Chipset Family LPC Controller/eSPI Controller - 9D4E (rev 21)
    Subsystem: Dell Intel(R) 100 Series Chipset Family LPC Controller/eSPI Controller - 9D4E
    Flags: bus master, fast devsel, latency 0

00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
    Subsystem: Dell Sunrise Point-LP PMC
    Flags: fast devsel
    Memory at df22c000 (32-bit, non-prefetchable) [disabled] [size=16K]

00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
    Subsystem: Dell Sunrise Point-LP HD Audio
    Flags: bus master, fast devsel, latency 32, IRQ 130
    Memory at df228000 (64-bit, non-prefetchable) [size=16K]
    Memory at df200000 (64-bit, non-prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel, snd_soc_skl

00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
    Subsystem: Dell Sunrise Point-LP SMBus
    Flags: medium devsel, IRQ 16
    Memory at df232000 (64-bit, non-prefetchable) [size=256]
    I/O ports at f040 [size=32]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c_i801

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
    Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
    Flags: bus master, fast devsel, latency 0, IRQ 16
    I/O ports at e000 [size=256]
    Memory at df104000 (64-bit, non-prefetchable) [size=4K]
    Memory at df100000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: r8169
    Kernel modules: r8169

02:00.0 Network controller: Intel Corporation Wireless 3165 (rev 79)
    Subsystem: Intel Corporation Wireless 3165
    Flags: bus master, fast devsel, latency 0, IRQ 129
    Memory at df000000 (64-bit, non-prefetchable) [size=8K]
    Capabilities: <access denied>
    Kernel driver in use: iwlwifi
    Kernel modules: iwlwifi