TritonDataCenter / smartos-live

For more information, please see http://smartos.org/ For any questions that aren't answered there, please join the SmartOS discussion list: https://smartos.topicbox.com/groups/smartos-discuss
1.58k stars 248 forks source link

xhci: coding error detected on USB 3 port after SmartOS upgrade #841

Open nahall opened 5 years ago

nahall commented 5 years ago

I recently upgraded a SmartOS server from version joyent_20190131T012237Z to joyent_20190703T233036Z.

I have an external USB hard drive plugged into a USB port on this machine, which has USB 3. I think the actual port it is plugged into is USB 2 but the log reports it is tied to a USB 3.0 root hub.

This has been working fine with SmartOS version joyent_20190131T012237Z but after I upgraded to joyent_20190703T233036Z I'm getting errors in /var/adm/messages:

WARNING: xhci: coding error detected, the driver is using ddi_dma_attr(9S) incorrectly. There is a small risk of data corruption in particular with large I/Os. The driver should be replaced with a corrected version for proper system operation. To disable this warning, add 'set rootnex:rootnex_bind_warn=0' to /etc/system(4). xhci: [ID 197104 kern.info] NOTICE: xhci0: failed to bind DMA memory: -3 xhci: [ID 902155 kern.info] NOTICE: xhci0: xhci stop endpoint command (3)/slot (2) in wrong state: 19 xhci: [ID 617155 kern.info] NOTICE: xhci0: endpoint is in state 3 xhci: [ID 902155 kern.info] NOTICE: xhci0: xhci stop endpoint command (3)/slot (2) in wrong state: 19 xhci: [ID 617155 kern.info] NOTICE: xhci0: endpoint is in state 3

and then the hard drive started throwing a ton of errors. zpool scrub showed a few thousand errors.

I rebooted several times with the same affect.

I then downgraded back to joyent_20190131T012237Z. Scrub now completed OK and things seem to be working. The errors are not reappearing.

This is a backup server so I can reboot it pretty easily if you want me to test a different version, or if you need me to run a script for more debugging, just let me know.

Thanks.

rmustacc commented 5 years ago

Thanks for reporting this and sorry for the trouble. It's a bit weird that we're seeing that since there haven't been any changes that I think would lead to that in xhci, but I'll put together a D script to try and dump more information about what's going on so we can diagnose that. Might take me a little bit to get that together.

nahall commented 5 years ago

Sure thing. Here's a little more info (currently running joyent_20190131T012237Z):

prtconf -dD | grep -i xhci

    pci17aa,30a5 (pciex8086,8c31) [Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI], instance #0 (driver name: xhci)

mdb -ke '::prtusb'

INDEX DRIVER INST NODE GEN VID.PID PRODUCT
1 xhci 0 pci17aa,30a5 3.0 0000.0000 No Product String 2 ehci 0 pci17aa,30a5 2.0 0000.0000 No Product String 3 ehci 1 pci17aa,30a5 2.0 0000.0000 No Product String 4 hubd 0 hub 2.0 8087.8008 No Product String 5 usb_mid 0 device 1.1 0a81.0101 USB Keyboard 6 scsa2usb 0 storage 2.1 1058.25fb easystore 25FB
7 scsa2usb 1 storage 2.0 0930.6545 DataTraveler 2.0 8 hubd 1 hub 2.0 8087.8000 No Product String

The device in question is the easystore 25FB.

Thanks.

nahall commented 5 years ago

Hello, do you have any update of anything I can run to help debug this? I'm currently running joyent_20190131T012237Z and its been very stable but I'd like to upgrade to a newer version. Thanks.