dell / thunderbolt-nvm-linux

Thunderbolt NVM updates for Linux
104 stars 12 forks source link

Thunderbolt firmwarm fails after Thunderbolt NVM (33) XPS 9370 #17

Closed steevee closed 5 years ago

steevee commented 5 years ago

Thunderbolt Controller has disappeared post-upgrade. Firmware may be corrupted(?) Can you advise how to manually reinstall the driver?

OS: Ubuntu 18.04 (and 18.10) Kernel: 4.15 (and 4.18) Product Name: XPS 13 9370 BIOS Version: 1.5.1 Thunderbolt BIOS security level: none

Steps: Thunderbolt NVM (33) was installed via Ubuntu Software Installer (after prompting for update) with a secondary display active (not sleeping). Post successfully installing, the prompt reappeared several times (similarly to #8 but consecutively). Post reboot could not wake thunderbolt ports (still takes power, but nothing else).

dmesg

Thunderbolt firmware fails to start.

...
[    7.775524] thunderbolt 0000:05:00.0: enabling device (0000 -> 0002)
[    7.776832] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[    7.777102] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    7.777167] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    7.777419] thunderbolt 0000:05:00.0: NHI initialized, starting thunderbolt
[    7.777422] thunderbolt 0000:05:00.0: allocating TX ring 0 of size 10
[    7.777441] thunderbolt 0000:05:00.0: allocating RX ring 0 of size 10
[    7.777456] thunderbolt 0000:05:00.0: control channel created
[    7.777456] thunderbolt 0000:05:00.0: control channel starting...
[    7.777458] thunderbolt 0000:05:00.0: starting TX ring 0
[    7.777473] thunderbolt 0000:05:00.0: enabling interrupt at register 0x38200 bit 0 (0x0 -> 0x1)
[    7.777474] thunderbolt 0000:05:00.0: starting RX ring 0
[    7.777489] thunderbolt 0000:05:00.0: enabling interrupt at register 0x38200 bit 12 (0x1 -> 0x1001)
[    7.777502] thunderbolt 0000:05:00.0: starting ICM firmware
[    7.777503] thunderbolt 0000:05:00.0: could not start ICM firmware # NOTE: error
[    7.777530] thunderbolt 0000:05:00.0: stopping RX ring 0
[    7.777541] thunderbolt 0000:05:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1)
[    7.777624] thunderbolt 0000:05:00.0: stopping TX ring 0
[    7.777635] thunderbolt 0000:05:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[    7.777640] thunderbolt 0000:05:00.0: control channel stopped
[    7.777650] thunderbolt 0000:05:00.0: freeing RX ring 0
[    7.777665] thunderbolt 0000:05:00.0: freeing TX ring 0
[    7.777676] thunderbolt 0000:05:00.0: shutdown
...

lspci -nn

Thunderbolt device appears to exist (whether devices connected or not):

...
05:00.0 System peripheral [0880]: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015] [8086:1577] (rev 02)
...

lsusb

Does not list any devices connected to thunderbolt ports (nor do any devices function)

lsmod | grep thunderbolt

Thunderbolt kernel modules present:

intel_wmi_thunderbolt    16384  0
thunderbolt           118784  0
wmi                    24576  5 intel_wmi_thunderbolt,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor

fwupdmgr get-devices --show-all-devices

No longer lists XPS 9370 Thunderbolt Controller as a device (whether devices connected or not):

Intel AMT [unprovisioned]
  DeviceId:             e2623122c99d58220498aacbfcfdb1baebbae3c5
  ParentDeviceId:       8a21cacfb0a8d2b30c5ee9290eb71db021619f8b
  Guid:                 2800f812-b7b4-2d4b-aca8-46e0ff65814c
  Summary:              Hardware and firmware technology for remote out-of-band management
  Plugin:               amt
  Flags:                internal|registered
  Vendor:               Intel Corporation
  Version:              11.8.55
  VersionBootloader:    11.8.55
  Icon:                 computer
  Created:              2018-11-28

XPS 13 9370 System Firmware
  DeviceId:             8a21cacfb0a8d2b30c5ee9290eb71db021619f8b
  Guid:                 7ceaf7a8-0611-4480-9e30-64d8de420c7c
  Guid:                 43ea5588-d9a4-5031-8ad3-308045302d6b
  Guid:                 230c8b18-8d9b-53ec-838b-6cfc0383493a
  Plugin:               uefi
  Flags:                internal|updatable|require-ac|supported|registered|needs-reboot
  Version:              0.1.5.1
  VersionLowest:        0.1.5.1
  Icon:                 computer
  Created:              2018-11-28

KXG50ZNV512G NVMe TOSHIBA 512GB
  DeviceId:             f954c7acdf5fab61aeaca1cd71d29ea5ade6992f
  Guid:                 4d0aed03-a30c-52c6-99e7-a8977797c3d9
  Guid:                 ad9fe8f7-cdc4-52c9-9fea-31b6f4988ffa
  Serial:               Z7BS10IYTY7T
  Summary:              NVM Express Solid State Drive
  Plugin:               nvme
  Flags:                internal|updatable|require-ac|registered|needs-reboot
  VendorId:             NVME:0x1179
  Version:              AADA4102
  Icon:                 drive-harddisk
  Created:              2018-11-28

Unifying Receiver
  DeviceId:             66a1e4c324ce716554ac74b67bf0a6e13ea6f583
  Guid:                 279ed287-3607-549e-bacc-f873bb9838c4
  Guid:                 21e75d9a-5ce6-5da2-b7ab-910c7f3f6836
  Guid:                 9d131a0c-a606-580f-8eda-80587250b8d6
  Summary:              A miniaturised USB wireless receiver
  Plugin:               unifying
  Flags:                supported|registered
  Vendor:               Logitech
  VendorId:             USB:0x046D
  Version:              RQR12.07_B0029
  VersionBootloader:    BOT01.02_B0014
  Icon:                 preferences-desktop-keyboard
  Created:              2018-11-28

fwupdmgr install

Cannot reinstall manually downloaded cab file:

$ fwupdmgr install ~/Downloads/925aaf439fc1b66aea20fb3868534e3853902c9d-ThunderboltFirmwareUpdateLinux_4.33.18.004.cab -v
(fwupdmgr:15249): Fwupd-DEBUG: Emitting ::status-changed() [decompressing]
Decompressing?         [-                                      ](fwupdmgr:15249): Fwupd-DEBUG: Emitting ::status-changed() [idle]
Decompressing?         [***************************************]
No supported devices found

Having some serious downtime trying to resolve. Any help much appreciated!

superm1 commented 5 years ago

Can you check your fwupd journal log to see if it showed anything about a failure while flashing? Or any historical kernel logs that showed a failure while flashing?

Presumably that dmesg you shared was from a fresh "cold" boot attempt right?

It's a bit surprising to me that anything could go wrong here. Thunderbolt has an "A/B" type layout where controller boots "A", flashes "B", verified "B", and then lastly change a pointer to "B" for next controller boot and then reboots controller.

Can you see if /sys/bus/thunderbolt/ populates with "Anything"? I'm guessing the answer is no from your dmesg above.

So next thought I have is to try to manually force it awake by: echo "1" | sudo tee /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power

If that wakes it try to flash again.

If that doesn't work then probably need to loop in some others in this area for ideas.

steevee commented 5 years ago

@superm1 thanks for the quick response!

Regarding fwupd journals, I can't see any errors specific to flash, but the only thing noteworthy is possibly this entry (around the time of the update):

Nov 23 09:42:35 steve-XPS-13-9370 dbus[1005]: [system] Activating via systemd: service name='org.freedesktop.fwupd' unit='fwupd.service'
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: ignoring add-delay as device usb:00:01:03 already pending
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: ignoring add-delay as device usb:00:01:02 already pending
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: ignoring add-delay as device usb:00:01:01 already pending
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: disabling plugin because: failed to coldplug raspberrypi: Raspberry PI firmware updating not supported, no /boot/start.elf
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: ignoring add-delay as device ro__sys_devices_pci0000_00_0000_00_14_0_usb2_2_1_2_1_1_0 already pending
Nov 23 09:42:36 steve-XPS-13-9370 fwupd[2712]: (fwupd:2712): Fu-WARNING **: ignoring add-delay as device ro__sys_devices_pci0000_00_0000_00_02_0 already pending
Nov 23 09:42:36 steve-XPS-13-9370 dbus[1005]: [system] Successfully activated service 'org.freedesktop.fwupd'

Yes, dmesg was on a cold boot.

/sys/bus/thunderbolt/ contains the following (albeit, nothing under devices or drivers):

$ sudo tree -RaF /sys/bus/thunderbolt/.
/sys/bus/thunderbolt/.
├── devices/
├── drivers/
├── drivers_autoprobe
├── drivers_probe
└── uevent

Afraid still no devices appear with fwupdmgr show-devices after running echo "1" | sudo tee /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power (let me know if I should be validating that through another means though!)

A symptom of the above, but thunderbolt GUI settings appears like so: image

Anything further ideas?

superm1 commented 5 years ago

@gicmo @YehezkelShB @westeri any thoughts on this?

Not really sure what else can be done to recover here.

YehezkelShB commented 5 years ago

Sounds like a FW update that went very wrong. Are we sure this is the right FW for the controller? Are the bridges, at least, appear in lspci output or the entry you copied above is the only one that mention thunderbolt there?

YehezkelShB commented 5 years ago

Alao, it could be interesting to see the actual log from the update both from fwupd and dmesg, if still available

superm1 commented 5 years ago

Yes sure it's the right firmware.

Just double checked the Windows and Linux ones and compared.

~/xps-9370-fw-check $ tree
.
├── linux
│   ├── 0x07E6.metainfo.xml
│   ├── 0x07E6_secure.bin
│   ├── 0x07E6_secure.bin.asc
│   ├── 925aaf439fc1b66aea20fb3868534e3853902c9d-ThunderboltFirmwareUpdateLinux_4.33.18.004.cab
│   └── README.txt
└── windows
    ├── 0x07E6_secure.bin
    ├── FwUpdateApi.dll
    ├── FwUpdateCmd.exe
    └── Intel_TBT3_FW_UPDATE_NVM33_CV9CP_A02_4.33.18.004.exe

2 directories, 9 files
~/xps-9370-fw-check $ sha1sum linux/0x07E6_secure.bin
3b051befc56b79cb1288e12990c4476023eb1b25  linux/0x07E6_secure.bin
~xps-9370-fw-check $ sha1sum windows/0x07E6_secure.bin
3b051befc56b79cb1288e12990c4476023eb1b25  windows/0x07E6_secure.bin

Furthermore there are the various fwupd checks to match to the system and the controller checks the signature on it.

westeri commented 5 years ago

Can you share full dmesg?

westeri commented 5 years ago

Also please power down the system completely, wait a bit and power it on and see if that changes anything.

YehezkelShB commented 5 years ago

@westeri If what you want is G3, it may take more than that on a laptop.

@superm1 Any tool / bios control for disconnecting all power or something like that without disconnecting the battery?

westeri commented 5 years ago

Sometimes just "cold boot" is enough, though :)

superm1 commented 5 years ago

No not really. EC will always be running unless you fully drain the battery or disconnect power. It does go into reset though when you power cycle from a S5 cold boot. I would say waiting a minute or two from a cold shutdown is likely enough of an approximation to what you would get with G3 since power should have been removed from AR with the EC bringing the rails down.

YehezkelShB commented 5 years ago

It's not just this, it's also for reloading the PD FW (the FW of the type-C port), which may have changed. Will EC bring the ports down too? They probably must be up to detect if power supply is connected, isn't it?

superm1 commented 5 years ago

That part I'm not 100% sure - I don't know the exact sequencing EC does with PD controllers from a cold boot.

If you want to force them to reset the only thing I know for sure that will cause that (other than forcefully removing power) is a host system firmware update.

westeri commented 5 years ago

This:

[    7.777502] thunderbolt 0000:05:00.0: starting ICM firmware

indicates that the ICM firmware is not running for some reason so I was hoping cold boot would maybe help here.

Other things to try is to change Thunderbolt configuration from the BIOS, say set security level to "user" and see if that helps.

YehezkelShB commented 5 years ago

Actually, the PD controller is maybe less important here. At most, I'd expect to see device connection/detection issues, not issues with ICM. So a shutdown and wait of a couple of minutes should be enough for this case.

steevee commented 5 years ago

@YehezkelShB lspci -nn | grep -i thunderbolt gives me:

03:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (rev 02)
04:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (rev 02)
04:01.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (rev 02)
04:02.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (rev 02)
04:04.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (rev 02)
05:00.0 System peripheral [0880]: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015] [8086:1577] (rev 02)

Digging through my /var/log/kern.log.1 I was able to extract dmesg from what looks like rebooting post the initial fwupd. Things looks a little weird from 19.732321 but I'm unsure how to interpret this.

@westeri @YehezkelShB have been disconnected and shutdown for ~5mins, but dmesg still contained [ 7.391336] thunderbolt 0000:05:00.0: starting ICM firmware (and subsequent could not start ICM firmware)

Currently trying to drain battery (4 hours to go!) and will try cold boot when reconnected to mains.

Thanks for your help everyone!

YehezkelShB commented 5 years ago

So the bridges are there. OK, makes more sense.

From the dmesg you posted, here are a few lines that @westeri may want to look at. Different error than the one you posted in the opening of the thread.

Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416124] thunderbolt 0000:05:00.0: failed to send driver ready to ICM
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416128] thunderbolt 0000:05:00.0: stopping RX ring 0
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416133] thunderbolt 0000:05:00.0: disabling interrupt at register 0x38200 bit 12 (0x1001 -> 0x1)
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416173] thunderbolt 0000:05:00.0: stopping TX ring 0
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416179] thunderbolt 0000:05:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416222] thunderbolt 0000:05:00.0: control channel stopped
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416233] thunderbolt 0000:05:00.0: freeing RX ring 0
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416238] thunderbolt 0000:05:00.0: freeing TX ring 0
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416246] thunderbolt 0000:05:00.0: shutdown
Nov 23 13:59:57 steve-XPS-13-9370 kernel: [   40.416437] thunderbolt: probe of 0000:05:00.0 failed with error -110

Still, I can't find the location of the FW update. Maybe in an earlier backup of the log?

westeri commented 5 years ago

Yeah, this one is different now

[   40.416124] thunderbolt 0000:05:00.0: failed to send driver ready to ICM

but the reason is the same: firmware is not running properly :-/

superm1 commented 5 years ago

@westeri any way to try to force it to try to use the other bank while in this state perhaps?

YehezkelShB commented 5 years ago

I don't think so. The switch happens only in the authentication step of the FW update. The only other way I know about is by using an external flash writer (like Dediprog), which is probably relevant only for in-lab cases, not for a controller in the field.

steevee commented 5 years ago

Just to clarify, the failed to send driver ready to ICM only appeared that once - I had assumed that was during the attempted FW update.

Since then I only see could not start ICM firmware every time I reboot.

If there's something specific that identifies the FW update in the kernel logs, what should I be looking for?

36mins of battery left if the cold reboot is of interest still! :smile:

superm1 commented 5 years ago

@steevee did you also try to change security levels in BIOS setup to see if it poked the controller as recommended above?

steevee commented 5 years ago

@superm1 yes I did, sorry forgot to update. Setting to User Authorization didn't have an effect. I still receive the could not start ICM firmware message and no thunderbolt devices listed by fwupdmgr get-devices

steevee commented 5 years ago

I drained the battery too. Waited 5 minutes, plugged into mains, and rebooted again too, but still get could not start ICM firmware in dmesg each boot. I tried a few variations with different security levels and a display-port plugged-in/out as well. Same result each time.

Possibly interesting (or not), I now see less devices listed with sudo fwupdmgr get-devices --show-all-devices (note: missing Unifying Receiver as per initial dumps):

Intel AMT [unprovisioned]
  DeviceId:             e2623122c99d58220498aacbfcfdb1baebbae3c5
  ParentDeviceId:       8a21cacfb0a8d2b30c5ee9290eb71db021619f8b
  Guid:                 2800f812-b7b4-2d4b-aca8-46e0ff65814c
  Summary:              Hardware and firmware technology for remote out-of-band management
  Plugin:               amt
  Flags:                internal|registered
  Vendor:               Intel Corporation
  Version:              11.8.55
  VersionBootloader:    11.8.55
  Icon:                 computer
  Created:              2018-11-29

XPS 13 9370 System Firmware
  DeviceId:             8a21cacfb0a8d2b30c5ee9290eb71db021619f8b
  Guid:                 7ceaf7a8-0611-4480-9e30-64d8de420c7c
  Guid:                 43ea5588-d9a4-5031-8ad3-308045302d6b
  Guid:                 230c8b18-8d9b-53ec-838b-6cfc0383493a
  Plugin:               uefi
  Flags:                internal|updatable|require-ac|supported|registered|needs-reboot
  Version:              0.1.5.1
  VersionLowest:        0.1.5.1
  Icon:                 computer
  Created:              2018-11-29

KXG50ZNV512G NVMe TOSHIBA 512GB
  DeviceId:             f954c7acdf5fab61aeaca1cd71d29ea5ade6992f
  Guid:                 4d0aed03-a30c-52c6-99e7-a8977797c3d9
  Guid:                 ad9fe8f7-cdc4-52c9-9fea-31b6f4988ffa
  Serial:               Z7BS10IYTY7T
  Summary:              NVM Express Solid State Drive
  Plugin:               nvme
  Flags:                internal|updatable|require-ac|registered|needs-reboot
  VendorId:             NVME:0x1179
  Version:              AADA4102
  Icon:                 drive-harddisk
  Created:              2018-11-29

Any further leads? Again, if you have something to help me identify FW updates in the logs let me know (I've seen nothing referencing fwupdate or firmware apart from starting the daemon)

Thanks

westeri commented 5 years ago

That's really weird. Did you try to reset the BIOS settings back to default or were they reset when you drained the battery already?

superm1 commented 5 years ago

They shouldn't be reset when battery was drained. So yes that's worth a shot.

Did you buy with Ubuntu? If so reset to "factory defaults" rather than "BIOS defaults".

Otherwise keep in mind when you reset that you'll potentially need to do the following if it fails to boot after the reset.

steevee commented 5 years ago

@westeri no I didn't reset the BIOS. TBH I didn't notice they were reset.

@superm1 yes I did buy with Ubuntu (16.04). Will give factory defaults a try (steps post a boot failure noted!)

steevee commented 5 years ago

Factory reset worked fine, and have since tried changing thunderbolt security levels but still to no avail (same could not start ICM firmware on boot every time)

Interestingly post factory reset, Unifying Receiver is still missing and I've got the following listed device that I haven't seen before:

XPS 13 9370 TPM 2.0
  DeviceId:             370e10407b1f04ade798a9f1d3e1fa57c67750c3
  Guid:                 cb8da68d-cd80-5f5b-8fef-038383adbb83
  Guid:                 6a704926-cdc4-53a0-84e7-227b5f1140dc
  Summary:              Platform TPM device
  Plugin:               uefi
  Flags:                internal|updatable|require-ac|registered|needs-reboot
  Vendor:               Dell Inc.
  Version:              7.2.0.1
  Icon:                 computer
  Created:              2018-11-29

We might be entering red-herring territory though.

superm1 commented 5 years ago

Yes this is red herring. it's just that your TPM is showing up now to fwupd (which that's odd it didn't before but unrelated).

westeri commented 5 years ago

Can you try to turn off Thunderbolt from the BIOS, save, reboot and re-enable it again?

steevee commented 5 years ago

@westeri just tried, also with/without disabling Always Allow Dell Docks, and cold boots (waiting 4-5mins between each BIOS save). No difference.

Oddly, still received could not start ICM firmware message when Thunderbolt was completely disabled. Why would it be attempting to start it when it's disabled? (FYI - all Thunderbolt BIOS settings were disabled, e.g. preboot, boot, etc)

westeri commented 5 years ago

Is the Thunderbolt controller always present? I mean when you just boot up the system, wait some time (so that fwupd powers down the controller) and run lspci couple of times, do you still see the Alpine Ridge controller?

steevee commented 5 years ago

lspci has always shown Alpine Ridge controller. What are you thinking?

Failing a software based approach to flashing the fw is this sounding like an issue for manufacturer support?

FYI I'd been running Ubuntu 16.04 on it since February and everything had worked a treat, so pretty sure it's not a hardware failure

westeri commented 5 years ago

My understanding is that the 9370 is using so called "BIOS assisted" enumeration mode where the Thunderbolt controller is only present when there is either device connected or it is force powered. For some reason in your system it is always powered which makes me think this could be BIOS related. Mario probably knows better, though. One thing that comes to mind is that maybe BIOS update helps here?

westeri commented 5 years ago

Or the NVM 33 image is in RTD3 more for some reason but I think it was already checked by Mario.

westeri commented 5 years ago

I have a couple of 9370's at the office so I can try to flash the same NVM 33 image tomorrow and see if I can reproduce the issue.

superm1 commented 5 years ago

There have been several hundred reports at LVFS of successful updates to NVM33 (including myself), so I don't expect it will be easily reproducible.

I did double check the NVM image and it is indeed configured for BIOS assist mode. That system doesn't support native or RTD3 in any case.

At this point I do think it's worth trying to flash BIOS again. I don't really understand how the controller is staying powered on, but something seems out of sync.

There is a 1.6.3. BIOS that is on support.dell.com (but not published on LVFS; that's in process and should be available soon). You can save it to your EFI system partition or a FAT32 USB key and flash it from the F12 menu at BIOS POST.

If it doesn't help with the new BIOS, please try the following:

  1. Try to use force power to "turn off" the controller (echo 0 rather than 1).
  2. Reset BIOS defaults again, but this time select "BIOS defaults" not "factory defaults". You'll probably need to change to AHCI mode and possibly that boot option entry this time around. You might also need to adjust legacy option ROM and secure boot settings if those don't match what are expected.
steevee commented 5 years ago

Successfully installed 1.6.3 BIOS via USB (FYI - USB3.1 ports work fine), otherwise no luck.

could not start ICM firmware still present in dmesg.

Forcing TB controller off (or on) makes no difference to the fwupdmgr get-devices --show-all-devices or lspci results (the latter here):

~ echo "1" | sudo tee /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
1
~ sudo lspci | grep -i thunderbolt
03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
05:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015] (rev 02)
~ echo "0" | sudo tee /sys/bus/wmi/devices/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
0
~ sudo lspci | grep -i thunderbolt
03:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
04:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (rev 02)
05:00.0 System peripheral: Intel Corporation DSL6540 Thunderbolt 3 NHI [Alpine Ridge 4C 2015] (rev 02)

(I've tried with display/power plugged-in and out - but please confirm I'm testing correctly!)

I reset the new BIOS to BIOS defaults (didn't have to change AHCI, just the boot sequence). Same dmesg error, lspci output (regardless of forcing on/off), and fwupdmgr get-devices --show-all-devices

westeri commented 5 years ago

I just tested on my end to update NVM from 23 -> 33 on 9370 and no issues.

It looks like in your system the force power GPIO or whatever is always powered and I have no idea why it might be :(

BTW, does USB3.1 work if you connect it to the Thunderbolt ports? I would expect not.

steevee commented 5 years ago

@westeri thanks for trying. If it's of use, I think fwupdate was applying both a BIOS and Thunderbolt update at the time it went pear-shaped.

I have a USB3.0 dongle (headphones/keyboard/mouse) plugged into USB3.1 via the Dell c-type converter. The devices only work when plugged into the left-hand USB3.1 port, and not the right-hand thunderbolt ports (confirmed when looking at lsusb too).

I don't know if it's at all relevant, but Ubuntu hangs if I plug in a power supply when the system has already booted. I'm just reconciling the kernel logs to the sequence of events I see (don't want to further muddy the waters!)

superm1 commented 5 years ago

Running a BIOS update and TBT update at same time shouldn't actually affect anything. The thunderbolt update happens "In OS" with the Thunderbolt kernel driver. The BIOS update is scheduled for the next boot. So after the thunderbolt update was applied the Thunderbolt controller should have reset itself and come right back.

Do you happen to have a Thunderbolt type C device you can plug in? I know you've mentioned monitors and a type C converter, but these are all operating in MFDP mode. I'm wondering if you got a Thunderbolt link going between the host and something else it would kick the controller out and you can re-flash it.

If you don't have a Thunderbolt type C device, do you perhaps have access to a second machine with Thunderbolt? You could connect a Type C cable between the two (as you would for Thunderbolt networking). From the other machine if it's running Windows or MacOS it will automatically try to start up thunderbolt networking. If it's running Linux, modprobe thunderbolt-net and it should try to start up the network.

If that gets it alive, try to flash it again with fwupd (use --allow-reinstall with the same CAB file in ~/.cache)

steevee commented 5 years ago

@superm1 I got my hands on a xps 13 9360 (ubuntu 17.10), these. When I connect 9360 via thunderbolt to the 9370 the latter is unresponsive with no change in lspci and the former has errors appear in dmesg:

[  204.183956] ACPI Error: [SPRT] Namespace lookup failure, AE_ALREADY_EXISTS (20170831/dswload2-346)
[  204.183993] No Local Variables are initialized for Method [_E42]
[  204.183999] No Arguments are initialized for method [_E42]
[  204.184005] ACPI Exception: AE_ALREADY_EXISTS, During name lookup/catalog (20170831/psobject-252)
[  204.184020] ACPI Error: Method parse/execution failed \_GPE._E42, AE_ALREADY_EXISTS (20170831/psparse-550)
[  204.184042] ACPI Error: Method parse/execution failed \_GPE._E42, AE_ALREADY_EXISTS (20170831/psparse-550)
[  204.184071] ACPI Exception: AE_ALREADY_EXISTS, while evaluating GPE method [_E42] (20170831/evgpe-646)

Subsequently sudo modprobe thuderbolt-net (or thunderbolt_net) return nothing, and nothing in the logs or changes in thunderbolt appearance on the 9370.

Out of curiosity, looking in my .cache I find all sorts of *.cab file directory structures...

.cache/./gnome-software/fwupd/1bffbc1fe8995c3442d471e5c3ba198cc65e4160-Signed_1152921504627775454.cab
.cache/./gnome-software/3.20/firmware/3b897e5f3af32d664647074a3b3444a12435fd19-Signed_1152921504627516555.cab
.cache/./gnome-software/3.20/firmware/a3b9d8b67f3c047d54860a35c4f67b8691323886-Signed_1152921504627700266.cab
.cache/./gnome-software/3.20/firmware/93c0e040c4738920a9f8a22d30381b028e900cbb-Signed_1152921504627455480.cab
.cache/./gnome-software/3.20/firmware/1bffbc1fe8995c3442d471e5c3ba198cc65e4160-Signed_1152921504627775454.cab
.cache/./gnome-software/3.20/firmware/8607c2c39dcca1e54e29343203180a14f2c3ea00-Initial_1152921504627618312.cab
.cache/./gnome-software/3.20/firmware/938fec082652c603a1cdafde7cd25d76baadc70d-Logitech-Unifying-RQR12.07_B0029.cab
.cache/./fwupdmgr/938fec082652c603a1cdafde7cd25d76baadc70d-Logitech-Unifying-RQR12.07_B0029.cab

Are the different directory structues expected, or is this just from me switching around fwdupd versions and moving from apt to snap?

superm1 commented 5 years ago

By chance upgrade the kernel on the 17.10 one over to a newer kernel? 4.15 or later is where thunderbolt-net landed. The useful thing here would be with thunderbolt-net loaded I think.

Oh you moved to snap? then it is in ~/snap IIRC.

Anyway you can pull it from https://fwupd.org/downloads/925aaf439fc1b66aea20fb3868534e3853902c9d-ThunderboltFirmwareUpdateLinux_4.33.18.004.cab again if you get thunderbolt alive again on the 9370.

steevee commented 5 years ago

@superm1 I've actually got a XPS 13 9350 running 17.10 (not a 9360 - sorry my confusion!), and it's already on 4.15.18 - thunderbolt_net is present in lsmod.

sudo modprobe thunderbolt-net (or thunderbolt_net) doesn't appear to have any effect (and nothing appears in dmesg -w on either machine)

I may have done something stupid on my 9350 too, but will open another issue to track that.

superm1 commented 5 years ago

Hopefully that other system is still in good shape. Really bad coincidence if you hit some problems there too.

@westeri any requirements for minimum NVM for thunderbolt networking to be operational? He was upgrading from 12 to 16, I wasn't sure if there was actually a minimum though.

Assuming that this was a valid test scenario with thunderbolt networking I'm afraid that it is time to contact support for a repair. Hopefully they can also "capture" your motherboard for analysis on what actually went wrong here. When contacting support feel free to refer them to this thread and they can reach out to me internally if there is any other question about the necessity of repair given the circumstances.

westeri commented 5 years ago

For Alpine Ridge, I don't think there is any minimum NVM.

steevee commented 5 years ago

Having now got 9350 on TBT 16.0 I tried modprob thunderbolt-net again (on both systems) but still cannot see TBT listed on fwupdmgr get-devices, connected devices does not respond/recognise, device is always listed in lspci regardless of force_power setting, and could not start ICM firmware error message still raised by kernel on boot...

:unamused:

Will raise with Dell support.

Many thanks for all your help here!

andrejpodzimek commented 5 years ago

Phew. Just updated my XPS 9370 and … Thunderbolt was gone, with an ominous timeout message saying that it "never came back" after the update. Then I spotted this thread (oh horror). :fearful: I was pretty much convinced that somehing similar had just happened to me.

But then I rebooted the machine and just let the system update run. (It stored the BIOS updates somewhere on the UEFI partition and added a boot entry to run them on next boot.) Those updates were successful, all green, and the next reboot took me automatically right back to my ArchLinux where get-devices shows Thunderbolt again (with version 33) All the IDs and versions are non-zero and dmesg shows lots of Thunderbolt stuff.

Well, hopefully I haven't killed the Thunderbolt controller; time will tell. For sure there's something scary about that update. :disappointed:

andrejpodzimek commented 5 years ago

Just tried to boltctl enroll a Samsung X5 SSD. It worked and now I can see it in lsblk. So it looks like my Thunderbolt controller didn't get bricked. Probably just a lucky coincidence.

superm1 commented 5 years ago

Getting never came back is not expected. What fwupd version are you using?

And what did you jump from tbt wise? And anything plugged into type c before during or after update but before reboot?