TritonDataCenter / illumos-kvm-cmd

qemu-kvm for illumos-kvm
Other
63 stars 40 forks source link

network stalls if you push hard enough #18

Closed gkyildirim closed 10 years ago

gkyildirim commented 10 years ago

Tested smartOS versions:

If you run iperf -c <Dest_IP> -P 10 network stalls on both linux VMs. This is tested on two different server hardwares (HP, Dell). I believe it is reproducible.

Rebooting inside centos/ubuntu does not work. You should stop and start zone. So it seems to be qemu's problem not linux. There is no log to share, it is all quiet.

BTW smartOS version 20131204T101631Z is also tested without failure.

rmustacc commented 10 years ago

When this happens, can you grab the zone id of the zone in question and also grab the output of:

mdb -ke '::walk vnd_dev_cache | ::print vnd_dev_t; ::walk vnd_str_cache | ::print vnd_str_t'

That'll help narrow this down.

gkyildirim commented 10 years ago

zone id is 33329dd8-ac40-40c2-9f68-d059ec54c166 and output is here. Note that this is done with https://us-east.manta.joyent.com/rmustacc/public/tmp/qemu.

rmustacc commented 10 years ago

For the future reference the string UUID is actually the zone name. However, because we only have one nic in play here this is fine. So this is a bug, and most probably a race condition that picked up which for some reason didn't manifest with our testing. Specifically the vnd_str_t has the vns_flags set indicating that it's flow controlled. Because of this additional data will not be sent out on the wire, though the guest will continue to send it. I'll do an audit of this code shortly and see if I can reproduce locally. Otherwise, if it's relatively easy, I may have a D script that'll help us pinpoint what's going on here, specifically to verify that mac is firing this.

This also explains why this persists across a QEMU reboot. Specifically this will persist as long as the zone exists. When you use a reboot in a guest versus a shutdown, the qemu process persists which is why we don't see this.

Thanks for your help with this.

gkyildirim commented 10 years ago

No problem, it is easy for me to run your D script if needed. Thanks for handling this bug.

rmustacc commented 10 years ago

I think I have a pretty good lead. I'll test locally to confirm I can reproduce this. In the interim if you can run the following during the test, particularly we'll be interested when it locks up:

dtrace -qn 'vnd:::flow-blocked,vnd:::flow-resumed{ printf("%x %s\n", timestamp, probename); stack(); }'

This should put the nail in the coffin on the problem. Assuming it's this, I have a pretty good idea of what the fix is.

gkyildirim commented 10 years ago

Here is the output.

dtrace -qn 'vnd:::flow-blocked,vnd:::flow-resumed{ printf("%x %s\n", timestamp, probename); stack(); }'
11515b330f1 flow-blocked

              vnd`vnd_squeue_tx_drain+0x112
              vnd`vnd_squeue_tx_append+0xe9
              ip`squeue_enter+0x41c
              gsqueue`gsqueue_enter_one+0x43
              vnd`vnd_frameio_write+0x10e
              vnd`vnd_ioctl+0x270
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall+0x17a
rmustacc commented 10 years ago

Hmm, okay. That's interesting and not what I expected. I've been going at this for a little bit, but I haven't been able to reproduce this locally yet. I suspect that the difference is in the nics that are being used. Can you let me know what kind of nics that you're using? Specifically which device has the nic tag for the physical nic you're using and is it a 1 GbE device or 10 GbE device?

This stack backtrace leaves me with more questions than answers. So the most helpful thing would be to produce a crash dump and make it available in this state if that's possible. To generate the crash dump, the simplest thing to do is run mdb -kwe clock/W -1. That will panic the box and when it comes back up a crash dump will be generated in /var/crash/volatile/ with a name like vmdump.0. If it's possible to make something like that available, that'll help.

gkyildirim commented 10 years ago

I hope to publish a crash dump of HP server today. Meanwhile I can share nics details.

I've tested with two different servers and two different nics. Both of them are 1GbE and working at full duplex & full speed.

Hardwares that run iperf -s are two different physical machines. Specifically they both have two different 1GbE nics and two different operating systems (smartos and Ubuntu).

gkyildirim commented 10 years ago

I've taken crash dump at the stall state. It can be found here.

rmustacc commented 10 years ago

@gkyildirim Thanks, I've slurped down the file.

rmustacc commented 10 years ago

Sorry for the delay in getting back to this. I have a brief update. By digging into this we can see that the srs here no longer considers itself out of tx descriptors. We also know from the D script that it wasn't notified that it was clear of flow control which is also problematic. I've also confirmed that we have in fact registered with that mac client. So the next open question is did we end up having a notification sent up or did something get lost along the way. I'll do some more digging in at the current state and then will hopefully have a D script that helps us answer these questions.

rmustacc commented 10 years ago

@gkyildirim can you run the following D script while you run the test again?

fbt::mac_tx_ring_update:entry
{
        printf("%x update via %s on %s\n", timestamp, probefunc,
            stringof(((mac_impl_t *)args[0])->mi_name));
        stack();
        self->update = 1;
}

fbt::mac_tx_srs_wakeup:entry
/self->update/
{
        printf("%x waking up an srs %p\n", timestamp, arg0);
        printf("ring_count: %d, st_arg2: %p, state: %x\n", 
            args[0]->srs_tx_ring_count,
            args[0]->srs_tx.st_arg2,
            args[0]->srs_state);
        stack();
}

fbt::mac_tx_invoke_callbacks:entry
{
        printf("%x callbacks invoked for client %s\n", timestamp,
            stringof(((mac_client_impl_t *)args[0])->mci_name));
        stack();
}

fbt::vnd_mac_flow_control:entry
{
        printf("%x vnd flow control fired for str %p!\n", timestamp, 
            (uintptr_t)arg1);
        stack();
}
gkyildirim commented 10 years ago

Pastebin seems to be overloaded right now. I paste results below.

CPU     ID                    FUNCTION:NAME
 12  31598         mac_tx_ring_update:entry 121534c9fbd update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121534ccfdc waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121534ce54e waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121534cec16 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12156311f37 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12156313d8b waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12156314c0b waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121563151d8 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 121564ed40d update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121564ee4a3 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121564eec0d waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121564ef0da waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 121615fa599 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121615fcff3 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121615fe2b3 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121615fea16 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12166661190 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166663af0 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166664f8f waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1216666570b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 121666f1e7d update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121666f2bc8 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121666f31de waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121666f362b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 121667f9a39 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121667fa7fa waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121667faeba waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121667fb34d waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12166975d1d update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166976c07 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166977270 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1216697772c waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12166c79304 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166c7a3ab waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166c7ab62 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166c7b050 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12166facae6 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166fae4df waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166faefe8 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12166faf503 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1216707eb7f update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1216707f9b8 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1216707ffd5 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1216708048b waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 121711bc6de update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121711bf120 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121711c03db waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121711c0aee waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 12171221791 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 121712224a9 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12171222aea waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 12171222f9a waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1217d2dcaf6 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d2df43b waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d2e072f waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d2e0e72 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1217d384a58 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d385872 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d385e94 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d3862e2 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1217d4216bd update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d422392 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d4229a6 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d422e2c waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1217d651a65 update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d652aeb waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d653216 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217d6536df waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31598         mac_tx_ring_update:entry 1217fd78e4c update via mac_tx_ring_update on bnx1

              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217fd7b882 waking up an srs ffffff0588f199c0
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217fd7cc62 waking up an srs ffffff0583324000
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13

 12  31586          mac_tx_srs_wakeup:entry 1217fd7d367 waking up an srs ffffff0583325980
ring_count: 0, st_arg2: 0, state: 4000000

              mac`i_mac_tx_srs_notify+0x60
              mac`mac_tx_ring_update+0x15
              mac`mac_tx_update+0x13
              bnx`bnx_txpkts_intr+0x22
              bnx`bnx_intr_xmit+0x38
              bnx`bnx_intr_1lvl+0x14d
              unix`av_dispatch_autovect+0x95
              unix`dispatch_hardint+0x36
              unix`switch_sp_and_call+0x13
rmustacc commented 10 years ago

Hmm. So the interesting thing here is that we never called into mac_tx_invoke_callbacks() which would be the path that causes vnd to be notified that it should send traffic again. There are a few options here as to how to fix this. We could basically work around this and allow vnd to then go ahead and check whether or not it is still flow controlled after some amount of time. However, I think the more proper fix is going to be to ensure that we actually fire these notification callbacks.

I'll need to do some digging to see if I can figure out why this specific case is designed not to fire registered callbacks and then fire them. I'll try and put together a new platform image that can be used to verify that we're doing the right thing here.

rmustacc commented 10 years ago

@gkyildirim I wanted to give you an update on where this is currently.

I think that I have a pretty good understanding as to what's happen and why it hasn't notified us yet. I'm going to put together a test SmartOS image that should address this and I'll update the ticket when it's available. Would you be able to test that image? It'd be greatly appreciated. Thanks for all your help and patience with root causing this one so far.

gkyildirim commented 10 years ago

@rmustacc thanks for all the good work of Bardiche. I am truly more than glad to help.

Please let me know the path for the image when available.

rmustacc commented 10 years ago

I've made the following different versions available:

usb iso raw platform

gkyildirim commented 10 years ago

@rmustacc thanks for the fix. It is working nicely.

VM constantly utilizes the GbE link smoothly (watching with vndstat).

rmustacc commented 10 years ago

Okay, great. Thanks for helping test that. I'm going to do some last sanity checks on that and then get that pushed before Thursday's release. Thanks for your help with this.

rmustacc commented 10 years ago

This corresponds to OS-2920.

rmustacc commented 10 years ago

This should be resolved with https://github.com/joyent/illumos-joyent/commit/00aeef62bb42a504b38a88a68a0bbc2b2e607cf0.