failedrequest / netmap

Automatically exported from code.google.com/p/netmap
0 stars 0 forks source link

Kernel oops on starting pkt-gen with a vale switch #56

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1) cd netmap/examples/
2) ./pkt-gen -i vale-2:b -frx
3) ./pkt-gen -i vale-2:a -ftx

What is the expected output? What do you see instead?
I expect traffic to be flowing. But instead pkt-gen hangs and there is no way 
to stop it

What version of the product are you using? On what operating system?
Latest version of netmap on debian 8 system
Linux uks2 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) 
x86_64 GNU/Linux
pharidos@uks2:~/netmap/ ☸ uname -a

pharidos@uks2:~/netmap/ ☸ git status
On branch master
Your branch is up-to-date with 'origin/master'.
...
...

Please provide any additional information below.
Kernel oops logs
================
kernel: [896630.365822] 163.399173 [1758] netmap_interp_ringid      invalid 
ring id 2
kernel: [896630.408994] BUG: unable to handle kernel NULL pointer dereference 
at           (null)
kernel: [896630.457136] IP: [<          (null)>]           (null)
kernel: [896630.492349] PGD bcd837067 PUD b36db5067 PMD 0
kernel: [896630.524890] Oops: 0010 [#5] SMP
kernel: [896630.551687] Modules linked in: virtio_net(O) netmap(O) virtio_ring 
virtio fuse ixgbevf vfio_iommu_type1 vfio_pci vfio vhost_net vhost macvtap 
macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun openvswitch gre 
vxlan libcrc32c binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd 
fscache sunrpc nls_utf8 nls_cp437 vfat fat x86_pkg_temp_thermal 
intel_powerclamp intel_rapl coretemp kvm_intel kvm crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
iTCO_wdt iTCO_vendor_support cryptd lpc_ich ttm hpilo hpwdt drm_kms_helper drm 
i2c_algo_bit evdev shpchp tpm_tis efi_pstore processor i2c_i801 i2c_core 
mfd_core efivars thermal_sys pcspkr wmi acpi_power_meter tpm ipmi_si ioatdma 
button ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 dm_mod sd_mod crc_t10dif 
crct10dif_generic sg crct10dif_pclmul crct10dif_common crc32c_intel ehci_pci 
uhci_hcd ixgbe xhci_hcd ehci_hcd tg3 dca ptp hpsa usbcore pps_core libphy 
usb_common mdio scsi_mod [last unloaded: netmap]
kernel: [896631.161711] CPU: 31 PID: 12330 Comm: pkt-gen Tainted: G  R   D    O 
 3.16.0-4-amd64 #1 Debian 3.16.7-ckt9-3~deb8u1 
kernel: [896631.224862] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 
11/03/2014
kernel: [896631.271055] task: ffff88077ae755f0 ti: ffff8807729b0000 task.ti: 
ffff8807729b0000
kernel: [896631.322005] RIP: 0010:[<0000000000000000>]  [<          (null)>]    
       (null)
kernel: [896631.373459] RSP: 0018:ffff8807729b3e78  EFLAGS: 00010246
kernel: [896631.414889] RAX: 00000000ffffffff RBX: ffff88105f2d30c0 RCX: 
0000000000000001
kernel: [896631.464121] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff880a6476e600
kernel: [896631.513296] RBP: ffff880a6476e600 R08: 0000000000000000 R09: 
0000000000000000
kernel: [896631.562291] R10: ffff88105c304010 R11: 0000000000000246 R12: 
ffff880778c96060
kernel: [896631.611252] R13: ffff88107c1cf920 R14: ffff8803503de0d8 R15: 
ffff880778c96060
kernel: [896631.660245] FS:  00007fa9720fd700(0000) GS:ffff88107f6c0000(0000) 
knlGS:0000000000000000
kernel: [896631.714064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [896631.756736] CR2: 0000000000000000 CR3: 0000000ad4993000 CR4: 
00000000001427e0
kernel: [896631.808357] Stack:
kernel: [896631.835158]  ffffffffa06cac95 ffff88107c1cf920 ffff880800000000 
ffff880a6476e600
kernel: [896631.886188]  0000000000000008 ffffffffa06cdac4 ffff88105f2d30c0 
ffffffffa06cdafe
kernel: [896631.937962]  ffff88105c304000 0000000000000008 ffffffffa06cf106 
ffffffff811a99ba
kernel: [896631.988872] Call Trace:
kernel: [896632.017125]  [<ffffffffa06cac95>] ? netmap_do_unregif+0xc5/0x120 
[netmap]
kernel: [896632.065514]  [<ffffffffa06cdac4>] ? netmap_dtor_locked+0x14/0x30 
[netmap]
kernel: [896632.113602]  [<ffffffffa06cdafe>] ? netmap_dtor+0x1e/0x100 [netmap]
kernel: [896632.159200]  [<ffffffffa06cf106>] ? linux_netmap_release+0x16/0x20 
[netmap]
kernel: [896632.208352]  [<ffffffff811a99ba>] ? __fput+0xca/0x1d0
kernel: [896632.248531]  [<ffffffff81085107>] ? task_work_run+0x97/0xd0
kernel: [896632.291320]  [<ffffffff81012ea9>] ? do_notify_resume+0x69/0xa0
kernel: [896632.334812]  [<ffffffff8151110a>] ? int_signal+0x12/0x17 
kernel: [896632.376108] Code:  Bad RIP value.
kernel: [896632.409506] RIP  [<          (null)>]           (null)
kernel: [896632.450003]  RSP <ffff8807729b3e78>
kernel: [896632.482560] CR2: 0000000000000000
kernel: [896632.547232] ---[ end trace 489d0e5e54a1afc3 ]---

Original issue reported on code.google.com by h.prem.a...@gmail.com on 16 Jun 2015 at 10:30

GoogleCodeExporter commented 9 years ago
Please use the 'next' branch, since it contains many bug fixes.

Moreover, note that vale-2:a and vale-2:b are not valid names, you should use 
vale2:a and vale2:b instead. The panic you experience is in the cleanup path, 
after 

Original comment by giuseppe.lettieri73 on 16 Jun 2015 at 11:58

GoogleCodeExporter commented 9 years ago
Thanks. The version from 'next' branch works perfectly fine. And I change the 
vale switch names also as per your suggestion. No kernel oops. I will close 
this issue

However, I am seeing too much drops on the rx side with this topology
    qemu (running pkt-gen tx) --- vale switch --- qemu (running pkt-gen rx)

pkt-gen on the tx side generates ~1.3Mpps. With the version of netmap/pkt-gen 
in 'master', I used to received almost 99.9% of pkts on the rx side. But with 
the version from 'next' I see only ~78% of the pkts on the rx side

Original comment by h.prem.a...@gmail.com on 17 Jun 2015 at 2:31