OpenKinect / libfreenect2

Open source drivers for the Kinect for Windows v2 device
2.07k stars 750 forks source link

[Error] [usb::TransferPool] failed to submit transfer: LIBUSB_ERROR_IO Input/Output Error #516

Closed guanchar closed 8 years ago

guanchar commented 8 years ago

Hi all, we are using libfreenect2 as part of IAI_kinect2 to run RTAB-map. However, after ten minutes of normal operation, we encounter the following error (LIBUSB_DEBUG=3, from iai_kinect2 console), and the depth stream stops.

[Error] [usb::TransferPool] failed to submit transfer: LIBUSB_ERROR_IO Input/Output Error
libusb: error [submit_iso_transfer] submiturb failed error -1 errno=12

However, this bug does not occur when running only Protonect (tested for ~1 hour) and also does not occur when running IAI_kinect2 alone.

We have reproduced this on both the 4.2 and 4.3.3 kernel running Ubuntu 14.04, ROS Indigo.

dmesg output:

[ 1417.215757] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 7 ep 8 with no TDs queued?
[ 1417.271120] kinect2_bridge: page allocation failure: order:7, mode:0x40d0
[ 1417.271131] CPU: 0 PID: 8264 Comm: kinect2_bridge Not tainted 4.3.3-040303-generic #201512150130
[ 1417.271134] Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F6 12/10/2014
[ 1417.271138]  0000000000000000 00000000bab79f6c ffff88010005ba98 ffffffff813c2b24
[ 1417.271144]  00000000000040d0 ffff88010005bb28 ffffffff8118a407 00000080ffffff80
[ 1417.271150]  00000007000040d0 00000000bab79f6c 0000000000000007 0000000000000040
[ 1417.271155] Call Trace:
[ 1417.271169]  [<ffffffff813c2b24>] dump_stack+0x44/0x60
[ 1417.271178]  [<ffffffff8118a407>] warn_alloc_failed+0xf7/0x150
[ 1417.271185]  [<ffffffff8118dfad>] __alloc_pages_nodemask+0x2ed/0xa10
[ 1417.271192]  [<ffffffff811d6201>] alloc_pages_current+0x91/0x100
[ 1417.271197]  [<ffffffff8118a5db>] alloc_kmem_pages+0x3b/0xe0
[ 1417.271203]  [<ffffffff811a967e>] kmalloc_order_trace+0x2e/0xc0
[ 1417.271212]  [<ffffffff81215b00>] ? poll_select_copy_remaining+0x140/0x140
[ 1417.271219]  [<ffffffff811e216d>] __kmalloc+0x21d/0x250
[ 1417.271230]  [<ffffffff815e8f93>] proc_do_submiturb+0x5c3/0xbc0
[ 1417.271235]  [<ffffffff815ea088>] usbdev_do_ioctl+0xaf8/0xfb0
[ 1417.271244]  [<ffffffff810e9da7>] ? hrtimer_start_range_ns+0x1d7/0x3d0
[ 1417.271249]  [<ffffffff815ea56e>] usbdev_ioctl+0xe/0x20
[ 1417.271254]  [<ffffffff81214e15>] do_vfs_ioctl+0x295/0x480
[ 1417.271259]  [<ffffffff810eeef5>] ? ktime_get_ts64+0x45/0xf0
[ 1417.271263]  [<ffffffff81215079>] SyS_ioctl+0x79/0x90
[ 1417.271268]  [<ffffffff817ebab6>] entry_SYSCALL_64_fastpath+0x16/0x75
[ 1417.271271] Mem-Info:
[ 1417.271280] active_anon:546641 inactive_anon:158169 isolated_anon:22
[ 1417.271280]  active_file:132439 inactive_file:84068 isolated_file:0
[ 1417.271280]  unevictable:8 dirty:51 writeback:0 unstable:0
[ 1417.271280]  slab_reclaimable:13107 slab_unreclaimable:11412
[ 1417.271280]  mapped:101709 shmem:118631 pagetables:11237 bounce:0
[ 1417.271280]  free:10368 free_pcp:378 free_cma:0
[ 1417.271288] Node 0 DMA free:15384kB min:28kB low:32kB high:40kB active_anon:52kB inactive_anon:156kB active_file:4kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:36kB slab_reclaimable:44kB slab_unreclaimable:108kB kernel_stack:32kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1417.271299] lowmem_reserve[]: 0 3409 3837 3837
[ 1417.271306] Node 0 DMA32 free:19520kB min:6936kB low:8668kB high:10404kB active_anon:2034068kB inactive_anon:478392kB active_file:490604kB inactive_file:297264kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:3573248kB managed:3493048kB mlocked:32kB dirty:152kB writeback:0kB mapped:356736kB shmem:395732kB slab_reclaimable:45404kB slab_unreclaimable:34996kB kernel_stack:6576kB pagetables:37764kB unstable:0kB bounce:0kB free_pcp:880kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1417.271316] lowmem_reserve[]: 0 0 428 428
[ 1417.271322] Node 0 Normal free:6568kB min:868kB low:1084kB high:1300kB active_anon:152472kB inactive_anon:154128kB active_file:39148kB inactive_file:39000kB unevictable:0kB isolated(anon):88kB isolated(file):0kB present:505856kB managed:438488kB mlocked:0kB dirty:52kB writeback:0kB mapped:50100kB shmem:78756kB slab_reclaimable:6980kB slab_unreclaimable:10544kB kernel_stack:2864kB pagetables:7180kB unstable:0kB bounce:0kB free_pcp:592kB local_pcp:232kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1417.271331] lowmem_reserve[]: 0 0 0 0
[ 1417.271337] Node 0 DMA: 10*4kB (UM) 9*8kB (UM) 4*16kB (UEM) 5*32kB (UEM) 1*64kB (M) 3*128kB (UEM) 1*256kB (E) 2*512kB (EM) 3*1024kB (UEM) 1*2048kB (E) 2*4096kB (M) = 15376kB
[ 1417.271361] Node 0 DMA32: 243*4kB (UEM) 62*8kB (UEM) 195*16kB (UEM) 120*32kB (UEM) 47*64kB (UE) 36*128kB (UEM) 16*256kB (E) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20140kB
[ 1417.271381] Node 0 Normal: 44*4kB (UEM) 138*8kB (UE) 101*16kB (UEM) 52*32kB (UEM) 18*64kB (UEM) 5*128kB (UEM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6608kB
[ 1417.271402] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1417.271405] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1417.271407] 343726 total pagecache pages
[ 1417.271410] 8588 pages in swap cache
[ 1417.271413] Swap cache stats: add 3349035, delete 3340447, find 394864/665918
[ 1417.271415] Free swap  = 3716436kB
[ 1417.271417] Total swap = 4093948kB
[ 1417.271420] 1023772 pages RAM
[ 1417.271422] 0 pages HighMem/MovableOnly
[ 1417.271424] 36914 pages reserved
[ 1417.271426] 0 pages cma reserved
[ 1417.271427] 0 pages hwpoisoned
[ 1417.278786] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 7 ep 8 with no TDs queued?
[ 1417.351602] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 7 ep 8 with no TDs queued?
[ 1417.352601] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 7 ep 8 with no TDs queued?
[ 1417.353598] xhci_hcd 0000:00:14.0: WARN Event TRB for slot 7 ep 8 with no TDs queued?
[ 1440.084736] usb 2-2.1: USB disconnect, device number 4
[ 1441.900081] usb 2-2.1: new SuperSpeed USB device number 5 using xhci_hcd
[ 1441.916864] usb 2-2.1: New USB device found, idVendor=045e, idProduct=02c4
[ 1441.916866] usb 2-2.1: New USB device strings: Mfr=1, Product=2, SerialNumber=4
[ 1441.916867] usb 2-2.1: Product: Xbox NUI Sensor
[ 1441.916868] usb 2-2.1: Manufacturer: Microsoft
[ 1441.916869] usb 2-2.1: SerialNumber: 030921644947

lspci -nn output:

00:00.0 Host bridge [0600]: Intel Corporation Crystal Well DRAM Controller [8086:0d00] (rev 08)
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:0d22] (rev 08)
00:03.0 Audio device [0403]: Intel Corporation Crystal Well HD Audio Controller [8086:0d0c] (rev 08)
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05)
00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 05)
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086:8c10] (rev d5)
00:1c.2 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 [8086:8c14] (rev d5)
00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086:8c16] (rev d5)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation HM87 Express LPC Controller [8086:8c4b] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c03] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 05)
02:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter [10ec:8821]
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)

Thanks to @xlz for helping so far, any addional assistance would be greatly appreciated!

xlz commented 8 years ago

Analysis:

[submit_iso_transfer] submiturb failed error -1 errno=12 tells us ENOMEM is passed from the kernel. And this time it is page allocation failure: order:7 inproc_do_submiturb, which requested a large 512KB memory block, when there was none left (0*512kB).

[ 1417.271361] Node 0 DMA32: 243*4kB (UEM) 62*8kB (UEM) 195*16kB (UEM) 120*32kB (UEM) 47*64kB (UE) 36*128kB (UEM) 16*256kB (E) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20140kB
[ 1417.271381] Node 0 Normal: 44*4kB (UEM) 138*8kB (UE) 101*16kB (UEM) 52*32kB (UEM) 18*64kB (UEM) 5*128kB (UEM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6608kB

Each iso transfer needs a contiguous kernel memory. Our transfers request 8 packets of 0x8400 bytes, which is 264KB. Allocating such large memory block can really fail at any time, which means libusb_submit_transfer() can fail at any time because of this reason, and once it fails, it is very likely to fail again. And this is unavoidable if the kernel allocates memory on each transfer. This can be mitigated to some extent by using smaller transfers, but 8 packets per transfer are already small.

The same issue is reported on linux-usb mailing list http://thread.gmane.org/gmane.linux.usb.general/132939. They proposed to introduce a zerocopy IO mechanism in usbfs to remove the need to allocate memory on each new transfer http://thread.gmane.org/gmane.linux.usb.general/133549. A patch is posted http://thread.gmane.org/gmane.linux.usb.general/134707.

xlz commented 8 years ago

For 0.1, we can document some mitigation measures, but it's too late to integrate this feature.

0.2 can probably use this zero-copy feature to implement a much more efficient transfer pool, also discussed in #447.

guanchar commented 8 years ago

I've been using cat /proc/buddyinfo to monitor free blocks.

I've attempted the following at mitigation, to little or no effect.

For reference, my system has 4 GB of RAM.

guanchar commented 8 years ago

Potential mitigation strategy: increasing min_free_kbytes seems to preserve enough higher order blocks to significantly delay memory allocation failure. My system is able to run 20+ minutes with this "fix".

echo 65536 > /proc/sys/vm/min_free_kbytes

Thanks to @xlz for the fix!