Closed fklassen closed 7 years ago
@duanehoward
Is this issue easily recreated? Does it fail every time? We have not seen this before.
Also, what is the "bond0" interface?
yes. it is easily recreated (every time I run). bond0 is a bond of eth0 and eth1 on the system.
Hmmm .. sounds like the bond is causing it.
I'll test with the bond disabled in a couple of hours. In the mean time, this looks similar to what we normally do for our configurations: https://help.ubuntu.com/community/UbuntuBonding
Disabling the bond seems to have resolved this. I removed the bonding module, and the bonding configs, upped the interface again and tried using qtx:eth0 versus eth0. sudo tcpreplay -i qtx:eth0 -tK --loop 3 ~/myfile.pcap 9289.59 Mbps
sudo tcpreplay -i eth0 -tK --loop 3 ~/myfile.pcap 6023.31 Mbps
It could be helpful if bonded interfaces could be supported, assuming this is the problem.
Thanks. We should be able to support bond interfaces, so I'll try to recreate it next time I work in this area.
Thanks Fred. I think I spoke too soon. For a single small-ish pcap (1.5G) this seems to work, If I start looping using cache it seems to throw the card into a chaotic state... See below, looks the same as the initial report. Bonding is not loaded, etc.
[18376.782717] WARNING: CPU: 15 PID: 0 at /build/buildd/linux-lts-trusty-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x267/0x270()
[18376.782719] NETDEV WATCHDOG: eth3 (ixgbe): transmit queue 0 timed out
[18376.782721] Modules linked in: quick_tx(OX) 8021q mrp garp stp llc psmouse gpio_ich serio_raw joydev wmi i7core_edac mac_hid edac_core dcdbas lp ipmi_si lpc_ich acpi_power_meter parport binfmt_misc ixgbe mptsas dca hid_generic mptscsih ptp usbhid mptbase pps_core hid mdio scsi_transport_sas bnx2 [last unloaded: pf_ring]
[18376.782744] CPU: 15 PID: 0 Comm: swapper/15 Tainted: G OX 3.13.0-43-generic #72~precise1-Ubuntu
[18376.782745] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.10.2 04/27/2012
[18376.782746] 0000000000000108 ffff88080fce3d28 ffffffff81757f91 00000000000076e6
[18376.782751] ffff88080fce3d78 ffff88080fce3d68 ffffffff8106afcc ffff88080fce3dc8
[18376.782754] ffff880800f10000 ffff880800f10360 ffff880800f0cec0 0000000000000040
[18376.782757] Call Trace:
[18376.782758]
by the way this is on two different machines, different Dell hardware models, etc. so it should be reproducible. The card in both machines is: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
Thanks. It appears that we have some stability issues that need to be looked at.
Just an additional data point (again with no bonds) using the same driver, but a different card (Intel 82599EB Fiber) seeing the same issue. Please let me know if there's any additional data I can provide to assist.
Friendly ping?
Hi Duane. It will be a couple weeks before I will be able to test this out on ixgbe adapters. Also, the author of this code is a student and is busy with school work, so it will take a bit of time for me to get up-to-speed on his code.
Thanks for the update Fred, please let me know if you need any additional details from our side.
I'm also observing quick_tx crash with CentOS 6.5 and e1000e driver. Not sure if the issues are related. Below is a stacktrace, @fklassen ,should i open a separate issue?
[quick_tx] INFO: Device registered: /dev/net/quick_tx_lo --> lo [quick_tx] INFO: Device registered: /dev/net/quick_tx_maint0 --> maint0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth7 --> eth7 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth8 --> eth8 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth9 --> eth9 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth10 --> eth10 [quick_tx] INFO: Device registered: /dev/net/quick_tx_pan0 --> pan0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth5 --> eth5 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth6 --> eth6 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth0 --> eth0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth1 --> eth1 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth2 --> eth2 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth3 --> eth3 ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted) Hardware name: Precision WorkStation T3400 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: quick_tx(U) nfnetlink_queue nfnetlink_log nfnetlink e1000e(U) e1000(U) pf_ring(U) fuse rfcomm sco bridge bnep l2cap bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev iTCO_wdt iTCO_vendor_support microcode cassini dcdbas shpchp parport_pc parport sg lpc_ich mfd_core i2c_i801 btusb bluetooth rfkill tg3 snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ptp pps_core x38_edac edac_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi video output wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: e1000e] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.32-431.20.3.el6.x86_64 #1 Call Trace:
It seems the e1000e issue is related to having pf_ring.ko also loaded at the same time, I removed pf_ring.ko and reloaded quick_tx, no errors this time.
[PF_RING] Module unloaded [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_lo [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_maint0 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth7 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth8 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth9 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth10 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_pan0 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth5 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth6 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth0 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth1 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth2 [quick_tx] INFO: Removing QuickTx device /dev/net/quick_tx_eth3 [quick_tx] INFO: Device registered: /dev/net/quick_tx_lo --> lo [quick_tx] INFO: Device registered: /dev/net/quick_tx_maint0 --> maint0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth7 --> eth7 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth8 --> eth8 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth9 --> eth9 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth10 --> eth10 [quick_tx] INFO: Device registered: /dev/net/quick_tx_pan0 --> pan0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth5 --> eth5 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth6 --> eth6 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth0 --> eth0 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth1 --> eth1 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth2 --> eth2 [quick_tx] INFO: Device registered: /dev/net/quick_tx_eth3 --> eth3
I have experienced this both with pf_ring loaded and unloaded, however this is a pf_ring enabled driver...
Any updates on this?
Not yet. quick_tx is not my code, and it may be a while before I can get my head around the code enough to fix this. I'll try to get to it, but for now quick_tx is on the back burner.
Dropping support for Quick TX until a maintainer becomes available #357
As reported by @daunehoward - moved from #169 ..
I'm also seeing this with ixgbe. When trying to use qtx with ixgbe, dmesg fills up with errors like the following: