Closed gkyildirim closed 10 years ago
When this happens, can you grab the zone id of the zone in question and also grab the output of:
mdb -ke '::walk vnd_dev_cache | ::print vnd_dev_t; ::walk vnd_str_cache | ::print vnd_str_t'
That'll help narrow this down.
zone id is 33329dd8-ac40-40c2-9f68-d059ec54c166 and output is here. Note that this is done with https://us-east.manta.joyent.com/rmustacc/public/tmp/qemu.
For the future reference the string UUID is actually the zone name. However, because we only have one nic in play here this is fine. So this is a bug, and most probably a race condition that picked up which for some reason didn't manifest with our testing. Specifically the vnd_str_t
has the vns_flags set indicating that it's flow controlled. Because of this additional data will not be sent out on the wire, though the guest will continue to send it. I'll do an audit of this code shortly and see if I can reproduce locally. Otherwise, if it's relatively easy, I may have a D script that'll help us pinpoint what's going on here, specifically to verify that mac is firing this.
This also explains why this persists across a QEMU reboot. Specifically this will persist as long as the zone exists. When you use a reboot in a guest versus a shutdown, the qemu process persists which is why we don't see this.
Thanks for your help with this.
No problem, it is easy for me to run your D script if needed. Thanks for handling this bug.
I think I have a pretty good lead. I'll test locally to confirm I can reproduce this. In the interim if you can run the following during the test, particularly we'll be interested when it locks up:
dtrace -qn 'vnd:::flow-blocked,vnd:::flow-resumed{ printf("%x %s\n", timestamp, probename); stack(); }'
This should put the nail in the coffin on the problem. Assuming it's this, I have a pretty good idea of what the fix is.
Here is the output.
dtrace -qn 'vnd:::flow-blocked,vnd:::flow-resumed{ printf("%x %s\n", timestamp, probename); stack(); }'
11515b330f1 flow-blocked
vnd`vnd_squeue_tx_drain+0x112
vnd`vnd_squeue_tx_append+0xe9
ip`squeue_enter+0x41c
gsqueue`gsqueue_enter_one+0x43
vnd`vnd_frameio_write+0x10e
vnd`vnd_ioctl+0x270
genunix`cdev_ioctl+0x39
specfs`spec_ioctl+0x60
genunix`fop_ioctl+0x55
genunix`ioctl+0x9b
unix`sys_syscall+0x17a
Hmm, okay. That's interesting and not what I expected. I've been going at this for a little bit, but I haven't been able to reproduce this locally yet. I suspect that the difference is in the nics that are being used. Can you let me know what kind of nics that you're using? Specifically which device has the nic tag for the physical nic you're using and is it a 1 GbE device or 10 GbE device?
This stack backtrace leaves me with more questions than answers. So the most helpful thing would be to produce a crash dump and make it available in this state if that's possible. To generate the crash dump, the simplest thing to do is run mdb -kwe clock/W -1
. That will panic the box and when it comes back up a crash dump will be generated in /var/crash/volatile/ with a name like vmdump.0
. If it's possible to make something like that available, that'll help.
I hope to publish a crash dump of HP server today. Meanwhile I can share nics details.
I've tested with two different servers and two different nics. Both of them are 1GbE and working at full duplex & full speed.
Hardwares that run iperf -s
are two different physical machines. Specifically they both have two different 1GbE nics and two different operating systems (smartos and Ubuntu).
I've taken crash dump at the stall state. It can be found here.
@gkyildirim Thanks, I've slurped down the file.
Sorry for the delay in getting back to this. I have a brief update. By digging into this we can see that the srs here no longer considers itself out of tx descriptors. We also know from the D script that it wasn't notified that it was clear of flow control which is also problematic. I've also confirmed that we have in fact registered with that mac client. So the next open question is did we end up having a notification sent up or did something get lost along the way. I'll do some more digging in at the current state and then will hopefully have a D script that helps us answer these questions.
@gkyildirim can you run the following D script while you run the test again?
fbt::mac_tx_ring_update:entry
{
printf("%x update via %s on %s\n", timestamp, probefunc,
stringof(((mac_impl_t *)args[0])->mi_name));
stack();
self->update = 1;
}
fbt::mac_tx_srs_wakeup:entry
/self->update/
{
printf("%x waking up an srs %p\n", timestamp, arg0);
printf("ring_count: %d, st_arg2: %p, state: %x\n",
args[0]->srs_tx_ring_count,
args[0]->srs_tx.st_arg2,
args[0]->srs_state);
stack();
}
fbt::mac_tx_invoke_callbacks:entry
{
printf("%x callbacks invoked for client %s\n", timestamp,
stringof(((mac_client_impl_t *)args[0])->mci_name));
stack();
}
fbt::vnd_mac_flow_control:entry
{
printf("%x vnd flow control fired for str %p!\n", timestamp,
(uintptr_t)arg1);
stack();
}
Pastebin seems to be overloaded right now. I paste results below.
CPU ID FUNCTION:NAME
12 31598 mac_tx_ring_update:entry 121534c9fbd update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121534ccfdc waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121534ce54e waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121534cec16 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12156311f37 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12156313d8b waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12156314c0b waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121563151d8 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 121564ed40d update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121564ee4a3 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121564eec0d waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121564ef0da waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 121615fa599 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121615fcff3 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121615fe2b3 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121615fea16 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12166661190 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166663af0 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166664f8f waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1216666570b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 121666f1e7d update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121666f2bc8 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121666f31de waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121666f362b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 121667f9a39 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121667fa7fa waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121667faeba waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121667fb34d waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12166975d1d update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166976c07 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166977270 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1216697772c waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12166c79304 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166c7a3ab waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166c7ab62 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166c7b050 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12166facae6 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166fae4df waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166faefe8 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12166faf503 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1216707eb7f update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1216707f9b8 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1216707ffd5 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1216708048b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 121711bc6de update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121711bf120 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121711c03db waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121711c0aee waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 12171221791 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 121712224a9 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12171222aea waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 12171222f9a waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1217d2dcaf6 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d2df43b waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d2e072f waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d2e0e72 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1217d384a58 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d385872 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d385e94 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d3862e2 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1217d4216bd update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d422392 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d4229a6 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d422e2c waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1217d651a65 update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d652aeb waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d653216 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217d6536df waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31598 mac_tx_ring_update:entry 1217fd78e4c update via mac_tx_ring_update on bnx1
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217fd7b882 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217fd7cc62 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
12 31586 mac_tx_srs_wakeup:entry 1217fd7d367 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000
mac`i_mac_tx_srs_notify+0x60
mac`mac_tx_ring_update+0x15
mac`mac_tx_update+0x13
bnx`bnx_txpkts_intr+0x22
bnx`bnx_intr_xmit+0x38
bnx`bnx_intr_1lvl+0x14d
unix`av_dispatch_autovect+0x95
unix`dispatch_hardint+0x36
unix`switch_sp_and_call+0x13
Hmm. So the interesting thing here is that we never called into mac_tx_invoke_callbacks() which would be the path that causes vnd to be notified that it should send traffic again. There are a few options here as to how to fix this. We could basically work around this and allow vnd to then go ahead and check whether or not it is still flow controlled after some amount of time. However, I think the more proper fix is going to be to ensure that we actually fire these notification callbacks.
I'll need to do some digging to see if I can figure out why this specific case is designed not to fire registered callbacks and then fire them. I'll try and put together a new platform image that can be used to verify that we're doing the right thing here.
@gkyildirim I wanted to give you an update on where this is currently.
I think that I have a pretty good understanding as to what's happen and why it hasn't notified us yet. I'm going to put together a test SmartOS image that should address this and I'll update the ticket when it's available. Would you be able to test that image? It'd be greatly appreciated. Thanks for all your help and patience with root causing this one so far.
@rmustacc thanks for all the good work of Bardiche. I am truly more than glad to help.
Please let me know the path for the image when available.
I've made the following different versions available:
@rmustacc thanks for the fix. It is working nicely.
VM constantly utilizes the GbE link smoothly (watching with vndstat).
Okay, great. Thanks for helping test that. I'm going to do some last sanity checks on that and then get that pushed before Thursday's release. Thanks for your help with this.
This should be resolved with https://github.com/joyent/illumos-joyent/commit/00aeef62bb42a504b38a88a68a0bbc2b2e607cf0.
Tested smartOS versions:
Tested joyent linux images:
If you run
iperf -c <Dest_IP> -P 10
network stalls on both linux VMs. This is tested on two different server hardwares (HP, Dell). I believe it is reproducible.Rebooting inside centos/ubuntu does not work. You should stop and start zone. So it seems to be qemu's problem not linux. There is no log to share, it is all quiet.
BTW smartOS version 20131204T101631Z is also tested without failure.