djwmarks / netmap

Automatically exported from code.google.com/p/netmap
1 stars 0 forks source link

Kernel oops when trying to open netmap with incrorrect hardware RING number #48

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have Intel 82599 10GE Nic with 4 hardware queues enabled and Debian 7 Wheezy.

cat /proc/interrupts |grep eth4
  74:  808231153          0          0          0   PCI-MSI-edge      eth4-TxRx-0
  75:        469  906003549          0          0   PCI-MSI-edge      eth4-TxRx-1
  76:        427          0  817517321          0   PCI-MSI-edge      eth4-TxRx-2
  77:        289          0          0 1341880240   PCI-MSI-edge      eth4-TxRx-3
  78:          5        102          0          0   PCI-MSI-edge      eth4

When I try to open incorrect ring id with: netmap@eth4-5 I got bunch of kernel 
errors:

[79384.241993] ixgbe 0000:0a:00.0: eth4: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX
[79384.425352] ixgbe 0000:0d:00.0: eth6: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX
[80285.103007] 360.585717 [1758] netmap_interp_ringid      invalid ring id 4
[80285.301151] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000598
[80285.301198] IP: [<ffffffffa03be12f>] mbq_safe_dequeue+0x54/0x54 [netmap]
[80285.301233] PGD 85d3b9067 PUD 85db31067 PMD 0 
[80285.301267] Oops: 0000 [#1] SMP 
[80285.301296] CPU 0 
[80285.301302] Modules linked in: ixgbe(O) mdio netmap(O) bridge stp 
cpufreq_userspace cpufreq_conservative cpufreq_powersave cpufreq_stats 
binfmt_misc loop snd_pcm snd_page_alloc snd_timer snd iTCO_wdt coretemp 
crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 aes_generic cryptd 
soundcore hpwdt iTCO_vendor_support sb_edac hpilo psmouse serio_raw joydev 
pcspkr button container ioatdma acpi_power_meter edac_core evdev ext4 crc16 
jbd2 mbcache dm_mod mperf 3w_9xxx 3w_xxxx raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 md_mod ahci 
libahci sata_nv sata_sil sata_via libata usbhid hid sg sd_mod crc_t10dif 
uhci_hcd aacraid scsi_mod thermal ehci_hcd usbcore usb_common igb(O) ptp 
pps_core dca processor thermal_sys [last unloaded: ixgbe]
[80285.309688] 
[80285.309708] Pid: 4155, comm: kipfw Tainted: G           O 3.2.0-4-amd64 #1 
Debian 3.2.63-2+deb7u1 HP ProLiant DL380e Gen8
[80285.309756] RIP: 0010:[<ffffffffa03be12f>]  [<ffffffffa03be12f>] 
mbq_safe_dequeue+0x54/0x54 [netmap]
[80285.309800] RSP: 0018:ffff880857c63e40  EFLAGS: 00010246
[80285.309822] RAX: ffff88085af76780 RBX: 0000000000000598 RCX: 00000000c0000100
[80285.309847] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000598
[80285.309872] RBP: 0000000000000000 R08: ffff880857c62000 R09: ffff880859450000
[80285.309896] R10: ffffffff81600000 R11: ffff880859450000 R12: ffff88085c94ad80
[80285.309921] R13: ffff88085c94ad80 R14: ffff88087e6540c0 R15: ffff88085ac16bd0
[80285.309946] FS:  00007f75c07cb700(0000) GS:ffff88087ee00000(0000) 
knlGS:0000000000000000
[80285.309983] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[80285.310006] CR2: 0000000000000598 CR3: 000000083ee35000 CR4: 00000000000406f0
[80285.310031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[80285.310055] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[80285.310080] Process kipfw (pid: 4155, threadinfo ffff880857c62000, task 
ffff88085af76780)
[80285.310117] Stack:
[80285.310135]  ffffffffa03be170 0000000000000001 ffff88085b00e800 
0000000000000598
[80285.310187]  ffffffffa03c4726 ffff88085db054c0 ffff88085db054c0 
ffff88085b00e800
[80285.310238]  ffffffffa03c3df1 ffffffff81036618 ffffffff8134f64c 
0000000000000246
[80285.310289] Call Trace:
[80285.310311]  [<ffffffffa03be170>] ? __mbq_purge+0x1b/0x2e [netmap]
[80285.310338]  [<ffffffffa03c4726>] ? netmap_hw_krings_delete+0x23/0x36 
[netmap]
[80285.310376]  [<ffffffffa03c3df1>] ? netmap_do_unregif+0x7b/0x100 [netmap]
[80285.310404]  [<ffffffff81036618>] ? should_resched+0x5/0x23
[80285.310431]  [<ffffffff8134f64c>] ? _cond_resched+0x7/0x1c
[80285.310456]  [<ffffffffa03c6786>] ? netmap_dtor_locked+0xf/0x1e [netmap]
[80285.310482]  [<ffffffffa03c67af>] ? netmap_dtor+0x1a/0x47 [netmap]
[80285.310508]  [<ffffffffa03c700d>] ? linux_nm_vi_change_mtu+0x3/0x3 [netmap]
[80285.310534]  [<ffffffffa03c701f>] ? linux_netmap_release+0x12/0x16 [netmap]
[80285.310563]  [<ffffffff810fbf45>] ? fput+0xf9/0x1a1
[80285.310586]  [<ffffffff810f9c70>] ? filp_close+0x62/0x6a
[80285.310609]  [<ffffffff810f9d06>] ? sys_close+0x8e/0xcb
[80285.310635]  [<ffffffff81355a92>] ? system_call_fastpath+0x16/0x1b
[80285.310658] Code: 45 00 75 08 48 c7 45 08 00 00 00 00 ff 4d 10 49 c7 04 24 
00 00 00 00 48 8b 75 20 48 89 df e8 e2 28 f9 e0 5b 5d 4c 89 e0 41 5c c3 <48> 8b 
07 48 85 c0 74 1d 48 8b 10 48 85 d2 48 89 17 75 08 48 c7 
[80285.310973] RIP  [<ffffffffa03be12f>] mbq_safe_dequeue+0x54/0x54 [netmap]
[80285.311003]  RSP <ffff880857c63e40>
[80285.311022] CR2: 0000000000000598
[80285.311427] ---[ end trace b39b220e2fbeae09 ]---
[80285.399608] ixgbe 0000:0a:00.0: eth4: detected SFP+: 3
[80286.208247] ixgbe 0000:0a:00.0: eth4: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX

And my server become crazy and I should reboot it. 

Please add checks about number of rings in user space.

Original issue reported on code.google.com by pavel.odintsov on 5 Mar 2015 at 9:21

GoogleCodeExporter commented 9 years ago
This is fixed in the 'next' branch. If you cannot/do not want to switch to the 
newer code, you can apply the attached patch.

Original comment by giuseppe.lettieri73 on 5 Mar 2015 at 1:54

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks!

Original comment by pavel.odintsov on 26 Mar 2015 at 8:32