TritonDataCenter / illumos-joyent

Community developed and maintained version of the OS/Net consolidation
http://www.illumos.org/projects/illumos-gate
266 stars 111 forks source link

i40e driver: sporadic issue #180

Closed youzhongyang closed 5 years ago

youzhongyang commented 6 years ago

Recently we evaluated a new supermicro unit which comes with X710 cards:

            pci8086,7 (pciex8086,1572) [Intel Corporation Ethernet Controller X710 for 10GbE SFP+], instance #0 (driver name: i40e)
            pci8086,0 (pciex8086,1572) [Intel Corporation Ethernet Controller X710 for 10GbE SFP+], instance #1 (driver name: i40e)
            pci8086,7 (pciex8086,1572) [Intel Corporation Ethernet Controller X710 for 10GbE SFP+], instance #2 (driver name: i40e)
            pci8086,0 (pciex8086,1572) [Intel Corporation Ethernet Controller X710 for 10GbE SFP+], instance #3 (driver name: i40e)

There's sporadic issue when running the following perl script which connects to a mysql db and then disconnects repeatedly:

#!/usr/bin/perl -w

use DBI;
use POSIX qw(strftime);
use Time::HiRes qw(usleep);

my $dbh;

while (1) {
   eval {
       $dbh = DBI->connect('DBI:mysql:dbname;host=dbhost', 'user', 'password', { RaiseError => 1 });
       $dbh->disconnect();
   } or print strftime("%H:%M:%S", localtime) . " => " . $@;
   usleep(500000);
}

We first thought the issue might be on the network switch side, but clearly the network switch (Juniper) report command indicated there's problem on the host side:

> show lacp statistics interfaces
Aggregated interface: ae5
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      et-0/0/50            6528201     6525933            0            0
      et-0/0/51             261738      261627            0            0

Aggregated interface: ae10
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/1             6561283     6526648            0            0

Aggregated interface: ae11
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/2             6561066     6526597            0            0

Aggregated interface: ae12
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/3             6561214     6526649            0            0

Aggregated interface: ae13
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/4             6561388     6526649            0            0

Aggregated interface: ae14
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/5             6561804     6527083            0            0

Aggregated interface: ae16
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/0             6555587     6524326            0            0

Aggregated interface: ae17
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/8             6560536     6526656            0            0

Aggregated interface: ae18
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/9             6560478     6526657            0            0

Aggregated interface: ae19
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/10            6549335     6526670            0            0

Aggregated interface: ae20
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/11            6560332     6526543            0            0

Aggregated interface: ae21
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/12            6039869     6008958            0            0

Aggregated interface: ae22
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/13            6560535     6526668            0            0

Aggregated interface: ae15
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/6               76177       86909            0            0
      xe-0/0/7               76177       86903            0            0

I then loaded an old SmartOS image which was built around Oct 26, 2017. Everything works flawlessly!

I haven't gotten a chance to figure out which commit to i40e driver might have caused the issue.

Thanks for your attention.

rmustacc commented 6 years ago

As someone not familiar with Juniper's switch interface, it's not clear what information it is that you're trying to show and describe here. While I'm sure there are issues here, it's not clear how you reach that conclusion from the above data.

There have been a few changes to the i40e driver since that time. The most significant one was the addition of support for LSO in 9e30beee2f0c127bf41868db46257124206e28d6.

youzhongyang commented 6 years ago

@rmustacc In all other LACP interfaces, there are more LACP hello PDUs received than transmitted. However in ae15 (the one configured for the host), there is less received than transmitted by the switch.

ikozhukhov commented 6 years ago

well, your info is very small. we are using Arista switches. we are using MLAG in 2 switches.

rmustacc commented 6 years ago

On 9/8/18 4:55 , youzhongyang wrote:

@rmustacc In all other LACP interfaces, there are more LACP hello PDUs received than transmitted. However in ae15 (the one configured for the host), there is less received than transmitted by the switch.

OK, thanks. Based on that there are a few possibilities to look at:

  1. We're not receiving packets because queues are backed up on i40e
  2. We're receiving them; however, they're getting dropped elsewhere due to things like the mac high watermark (mac_rxsdrops)
  3. We're receiving them, processing them, and then turning around to send them, but failing to send it for some reason.

I believe most of these cases should be covered by the i40e and mac kstats. I might recommend gathering all of them and seeing what's increasing as you reproduce the problem.

Does that make sense?

youzhongyang commented 6 years ago

@rmustacc Thanks. It makes sense.

I will continue to figure out what has caused the sporadic issue, but I just noticed that i40e only has one rx group, which means only one SRS for each interface, so it's guaranteed that there will be rxsdrops when under load.

> ::walk mac_client_impl_cache | ::printf "%p %-16s %4d %20p\n" mac_client_impl_t . mci_name mci_flent->fe_rx_srs_cnt mci_flent->fe_rx_ring_group
ffffc1944fa018f8 z4_net0             1     ffffc1944e7b22c0
ffffc1944fa03d68 z3_net0             1     ffffc1944e7b22c0
ffffc1944fa061d8 z1_net0             1     ffffc1944e7b22c0
ffffc1944fa08648 z2_net0             1     ffffc1944e7b22c0
ffffc1944fa0aab8 admin1              1     ffffc1944e7b22c0
ffffc193e0990008 admin0              1     ffffc1944e7b22c0
ffffc193e0992478 aggr0               1     ffffc1944e7b22c0
ffffc193e09948e8 aggr0-i40e3         0     ffffc193e13fa780
ffffc193e0996d58 aggr0-i40e2         0     ffffc193e07c6dc0
ffffc193e09991c8 aggr0-i40e1         0     ffffc193e13fadc0
ffffc193e099b638 aggr0-i40e0         0     ffffc193e0497e40
ffffc193e099daa8 i40e0               0                    0
ffffc193e09a0010 ixgbe3              9     ffffc1940320e080
ffffc193e09a2480 ixgbe1              9     ffffc19403307b40
ffffc193e09a48f0 ixgbe0              9     ffffc193dfa39dc0
ffffc193e09a6d60 ixgbe2              9     ffffc193dfa3a300
ffffc193e09a91d0 i40e3               0                    0
ffffc193e09ab640 i40e1               0                    0
ffffc193e09adab0 i40e2               0                    0
ffffc1944f9fd018 z6_net0             1     ffffc1944e7b22c0
ffffc1944f9ff488 z5_net0             1     ffffc1944e7b22c0
rzezeski commented 6 years ago

@youzhongyang We have a patch that brings MAC group support to i40e. It's ready to go except I haven't had time to perform tests on an X722. I'm hoping to return to it once I get other work out of the way.

https://cr.joyent.us/#/c/3901/

youzhongyang commented 6 years ago

@rzezeski It's great that multi-group support to i40e is under way. I tried your patch, but unfortunately, i40e_config_def_vsi() -> i40e_aq_update_vsi_params() failed with error code -53 (I40E_ERR_ADMIN_QUEUE_ERROR). I added code to print out hw->aq.asq_last_status, it is 14 (I40E_AQ_RC_EINVAL).

I've tried to build image using all the files under i40e in your git tree, still no luck. It would be appreciated if you happen to have an answer for this problem. In the meantime, I am trying to read the patch and understand it.

By the way, the cards we are using are "Intel Corporation Ethernet Controller X710 for 10GbE SFP+".

rzezeski commented 6 years ago

@youzhongyang My testing was done on X710-DA2. Do you remember what CR rev you built? I made a blunder where I uploaded an old patch for patch set 5. Though, we have seen this issue on an X722 that we are trying to get to the bottom of.

youzhongyang commented 6 years ago

@rzezeski I tested patch 4, and 6, same results. I printed out some data after i40e_config_def_vsi() -> i40e_aq_get_vsi_params(), not sure if it helps:

    err = i40e_aq_get_vsi_params(hw, &context, NULL);
    if (err != I40E_SUCCESS) {
        i40e_error(i40e, "get VSI params failed with %d", err);
        return (B_FALSE);
    }

+   i40e_log(i40e, "def_vsi_seid %d seid %d pf_num %d vsi_number %d vsis_allocated %d "
+      "vsis_unallocated %d valid_sections %x mapping_flags %x queue_mapping[0] %x "
+      "tc_mapping[0] %x port_vlan_flags %x", I40E_DEF_VSI_SEID(i40e), context.seid,
+      context.pf_num, context.vsi_number, context.vsis_allocated, context.vsis_unallocated,
+      context.info.valid_sections, context.info.mapping_flags, context.info.queue_mapping[0],
+      context.info.tc_mapping[0], context.info.port_vlan_flags);
+

Sep 20 19:39:16 batfs1399 i40e: [ID 517869 kern.info] NOTICE: i40e0: def_vsi_seid 390 seid 390 pf_num 0 vsi_number 6 vsis_allocated 1 vsis_unallocated 305 valid_sections 3ff mapping_flags 0 queue_mapping[0] 0 tc_mapping[0] 0 port_vlan_flags 1b
rmustacc commented 6 years ago

In this case, I believe the setting of valid_sections to 3ff is incorrect.

youzhongyang commented 6 years ago

@rmustacc Good catch. So in i40e_set_shared_vsi_props(), if we don't OR valid_sections, it may work. I will test this and report back.

rmustacc commented 6 years ago

@youzhongyang I suspect that @rzezeski has something for that, but there's another issue we have with the X722 that we're tracking down at the moment.

youzhongyang commented 6 years ago

Yes, now it works.

youzhongyang commented 6 years ago

I'd like to come back to the original issue. Yesterday I reverted "OS-5225 Want Fortville TSO support", built an image, ran the reproduction script for 8 hours, no single failure. So most likely OS-5225 is responsible.

rmustacc commented 6 years ago

On 9/25/18 5:40 , youzhongyang wrote:

I'd like to come back to the original issue. Yesterday I reverted "OS-5225 Want Fortville TSO support", built an image, ran the reproduction script for 8 hours, no single failure. So most likely OS-5225 is responsible.

To help narrow things down, are you seeing DMA bind errors? If you look at i40e_ring_tx() there generally are either error messages logged to the system log or kstats logged when we hit any of those cases. That would help us understand whether we're seeing something weird here or something is being mangled by the device due to something that we're doing.

youzhongyang commented 6 years ago

@rmustacc I don't see bind errors, but I saw the following:

i40e:0:trqpair_tx_0:tx_err_nodescs      2218
i40e:1:trqpair_tx_0:tx_err_nodescs      2161
i40e:2:trqpair_tx_0:tx_err_nodescs      2145
i40e:3:trqpair_tx_0:tx_err_nodescs      2224

Does it ring a bell?

I'd also like to mention that when first time I reported the issue here, what I observed was sporadic mysql connection error, but now (not sure why, probably due to new/latest synced code) once the issue happens, the host is completely cut off from the network, I had to access it through IPMI console.

rzezeski commented 5 years ago

It sounds like this might be OS-7344.

You could run something like the following in mdb -k to determine if your i40e Tx ring is frozen.

> ::walk mac_impl_cache | ::printf "0x%p %s\n" mac_impl_t . mi_name ! grep i40e
0xfffffeb1808d4288 i40e1
0xfffffeb1808d7848 i40e0
> 

> 0xfffffeb1808d7848::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
3411681 3425654 3424637 83264 (7)
2073951 2073961 2073961 0 (1024)
10900034 10911847 10911847 0 (1024)
1082387 1085000 1083983 186 (7)
1948462 1951178 1950161 1519 (7)
2201291 2204202 2203185 591 (7)
11181905 11192905 11192905 0 (1024)

You'll notice the second to last column will probably be non-zero indicating failed Tx attempts and you'll notice the number in parens is stuck at some value below 1024.

youzhongyang commented 5 years ago

@rzezeski Yes, I confirmed that. Unfortunately when it happened, the only way I could run mdb is through IPMI console, so it's too much for me to re-type what I saw on the console.

By the way, do you have a possible fix for me to test? I attempted but none worked.

rzezeski commented 5 years ago

You can try https://cr.joyent.us/#/c/5038/. Keep in mind I haven't had a chance to test this yet, but given how small the change is I'm pretty confident it should work for you.

youzhongyang commented 5 years ago

I tested the patch, it didn't work. It's interesting that for each i40e instance, it's the first trq pair that got stuck.

# cat /var/tmp/i40e-txs
::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 8 | ::printf "%p %u %u %u %u (%u)\n" i40e_trqpair_t itrq_i40e itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free

# cat /var/tmp/i40e-out
fffffdf2cbaca000 104232 216636 215619 397 (7)
fffffdf2cbaca000 42894 87483 87483 0 (1024)
fffffdf2cbaca000 141050 141849 141849 0 (1024)
fffffdf2cbaca000 3967 4012 4012 0 (1024)
fffffdf2cbaca000 450516 892656 892656 0 (1024)
fffffdf2cbaca000 36776 73528 73528 0 (1024)
fffffdf2cbaca000 561834 978397 978397 0 (1024)
fffffdf2cbaca000 23206 36405 36405 0 (1024)
fffffdf2cacf0000 392290 407564 406547 399 (7)
fffffdf2cacf0000 164811 164905 164905 0 (1024)
fffffdf2cacf0000 272699 273234 273234 0 (1024)
fffffdf2cacf0000 597873 597887 597887 0 (1024)
fffffdf2cacf0000 118189 118195 118195 0 (1024)
fffffdf2cacf0000 223511 223513 223513 0 (1024)
fffffdf2cacf0000 1245891 2438747 2438747 0 (1024)
fffffdf2cacf0000 576390 723426 723426 0 (1024)
fffffdf2caaac000 1812102 2016665 2015648 439 (7)
fffffdf2caaac000 3195 3219 3219 0 (1024)
fffffdf2caaac000 23419 23613 23613 0 (1024)
fffffdf2caaac000 3051 3060 3060 0 (1024)
fffffdf2caaac000 4617 4624 4624 0 (1024)
fffffdf2caaac000 2931 2942 2942 0 (1024)
fffffdf2caaac000 346877 346883 346883 0 (1024)
fffffdf2caaac000 410654 831558 831558 0 (1024)
fffffdf2cacf8000 9223 9373 8356 399 (7)
fffffdf2cacf8000 257134 257194 257194 0 (1024)
fffffdf2cacf8000 3922 3943 3943 0 (1024)
fffffdf2cacf8000 3772 3804 3804 0 (1024)
fffffdf2cacf8000 25669 50478 50478 0 (1024)
fffffdf2cacf8000 3786 3817 3817 0 (1024)
fffffdf2cacf8000 60624 135190 135190 0 (1024)
fffffdf2cacf8000 510273 510323 510323 0 (1024)
rzezeski commented 5 years ago

That's very interesting. If you could force a dump (like reboot -d or NMI) that would be great. And either somehow provide the dump to me or I can give you some instructions for debugging when I have time next week. Alternatively, you could look at what I did in OS-7344 and see if you can print out the tcb_bind_info[0] of the head data descriptor. I'm curious if dbi_len is still 0x4000.

youzhongyang commented 5 years ago

No, I can't find dbi_len 0x4000.

Here is mdb output after reproducing using official joyent_20181108T015750Z image:

> ::walk mac_impl_cache | ::printf "%s %p\n" mac_impl_t mi_name mi_driver ! grep i40e | awk '{print $2}' > /var/tmp/i40e.lst
> ::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%p %p %u %u %u %u (%u)\n" i40e_trqpair_t itrq_i40e . itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
fffffdf2c725f000 fffffdf2c9de3000 11294984 27403837 27403837 0 (1024)
fffffdf2c725f000 fffffdf2c9de3710 2049 3319 3319 0 (1024)
fffffdf2c725f000 fffffdf2c9de3e20 11412329 27242548 27242548 0 (1024)
fffffdf2c725f000 fffffdf2c9de4530 2169 6163 6163 0 (1024)
fffffdf2c725f000 fffffdf2c9de4c40 11078067 26625114 26625114 0 (1024)
fffffdf2c725f000 fffffdf2c9de5350 5400894 14652345 14652345 0 (1024)
fffffdf2c725f000 fffffdf2c9de5a60 2619625 7876225 7876225 0 (1024)
fffffdf2ca03a000 fffffdf2cb76b000 10703867 25490739 25490739 0 (1024)
fffffdf2ca03a000 fffffdf2cb76b710 8893035 21023792 21023792 0 (1024)
fffffdf2ca03a000 fffffdf2cb76be20 8805076 22182978 22182978 0 (1024)
fffffdf2ca03a000 fffffdf2cb76c530 5980941 14026528 14026528 0 (1024)
fffffdf2ca03a000 fffffdf2cb76cc40 528 533 533 0 (1024)
fffffdf2ca03a000 fffffdf2cb76d350 832 840 840 0 (1024)
fffffdf2ca03a000 fffffdf2cb76da60 266505 631606 631606 0 (1024)
fffffdf2c927c000 fffffdf2cb76f000 14825346 35811593 35811593 25 (1024)
fffffdf2c927c000 fffffdf2cb76f710 29175226 80456322 80456322 0 (1024)
fffffdf2c927c000 fffffdf2cb76fe20 5523683 13085889 13085889 0 (1024)
fffffdf2c927c000 fffffdf2cb770530 35940473 95537525 95537525 0 (1024)
fffffdf2c927c000 fffffdf2cb770c40 6812887 15982211 15982211 0 (1024)
fffffdf2c927c000 fffffdf2cb771350 6626531 15902016 15902016 2 (1024)
fffffdf2c927c000 fffffdf2cb771a60 28523625 77236140 77235248 0 (132)
fffffdf2c9a68000 fffffdf2c9ddf000 12663106 31718851 31718851 0 (1024)
fffffdf2c9a68000 fffffdf2c9ddf710 12441374 29776858 29775840 101 (6)
fffffdf2c9a68000 fffffdf2c9ddfe20 1941 4707 4707 0 (1024)
fffffdf2c9a68000 fffffdf2c9de0530 975 1918 1918 0 (1024)
fffffdf2c9a68000 fffffdf2c9de0c40 2402 6708 6708 0 (1024)
fffffdf2c9a68000 fffffdf2c9de1350 2714235 6685640 6685640 0 (1024)
fffffdf2c9a68000 fffffdf2c9de1a60 11839750 28565100 28564083 729 (7)

> fffffdf2c9de1a60::print i40e_trqpair_t itrq_tcb_work_list | ::array void* 0t1024 | ::print void* | ::grep '.!=0' | ::print i40e_tx_control_block_t tcb_bind_info | ::grep '.!=0' | ::print struct i40e_dma_bind_info dbi_len | ::grep .==0x4000
4000

> ::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::print i40e_trqpair_t itrq_tcb_work_list | ::array void* 0t1024 | ::print void* | ::grep '.!=0' | ::print i40e_tx_control_block_t tcb_bind_info | ::grep '.!=0' | ::print struct i40e_dma_bind_info dbi_len | ::grep .==0x4000
4000
4000
4000

The following are mdb output after reproducing with OS-7344 patch applied on top of joyent_20181108T015750Z:

> ::walk mac_impl_cache | ::printf "%s %p\n" mac_impl_t mi_name mi_driver ! grep i40e | awk '{print $2}' > /var/tmp/i40e.lst
> ::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%p %p %u %u %u %u (%u)\n" i40e_trqpair_t itrq_i40e . itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
fffffdf2c92ad000 fffffdf2c9cff000 20155751 61367220 61367216 0 (1020)
fffffdf2c92ad000 fffffdf2c9cff710 870 2928 2928 0 (1024)
fffffdf2c92ad000 fffffdf2c9cffe20 454 460 460 0 (1024)
fffffdf2c92ad000 fffffdf2c9d00530 1936 5038 4503 0 (489)
fffffdf2c92ad000 fffffdf2c9d00c40 425 427 427 0 (1024)
fffffdf2c92ad000 fffffdf2c9d01350 364 370 370 0 (1024)
fffffdf2c92ad000 fffffdf2c9d01a60 485 762 762 0 (1024)
fffffdf2c9993000 fffffdf2cae70000 2240 4683 4683 0 (1024)
fffffdf2c9993000 fffffdf2cae70710 227 231 231 0 (1024)
fffffdf2c9993000 fffffdf2cae70e20 414 548 548 0 (1024)
fffffdf2c9993000 fffffdf2cae71530 347 446 446 0 (1024)
fffffdf2c9993000 fffffdf2cae71c40 9892458 30107018 30107016 0 (1022)
fffffdf2c9993000 fffffdf2cae72350 306 310 310 0 (1024)
fffffdf2c9993000 fffffdf2cae72a60 300 305 305 0 (1024)
fffffdf2c9981000 fffffdf2cae74000 10781732 33026564 33026559 0 (1019)
fffffdf2c9981000 fffffdf2cae74710 518 520 520 0 (1024)
fffffdf2c9981000 fffffdf2cae74e20 305 309 309 0 (1024)
fffffdf2c9981000 fffffdf2cae75530 279 282 282 0 (1024)
fffffdf2c9981000 fffffdf2cae75c40 273 348 348 0 (1024)
fffffdf2c9981000 fffffdf2cae76350 406 442 442 0 (1024)
fffffdf2c9981000 fffffdf2cae76a60 1884 4941 4803 0 (886)
fffffdf2c9c14000 fffffdf2c9cfb000 1273 1550 1550 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfb710 323 323 323 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfbe20 371 629 629 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfc530 653 940 940 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfcc40 1672 1676 1676 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfd350 737130 2152767 2152767 0 (1024)
fffffdf2c9c14000 fffffdf2c9cfda60 1769136 5422355 5422355 0 (1024)
> ::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::print i40e_trqpair_t itrq_tcb_work_list | ::array void* 0t1024 | ::print void* | ::grep '.!=0' | ::print i40e_tx_control_block_t tcb_bind_info | ::grep '.!=0' | ::print struct i40e_dma_bind_info dbi_len | ::grep .==0x4000
> fffffdf2c9d00530::print i40e_trqpair_t itrq_tcb_work_list | ::array void* 0t1024 | ::print void* | ::grep '.!=0' | ::print i40e_tx_control_block_t tcb_bind_info | ::grep '.!=0' | ::print struct i40e_dma_bind_info dbi_len | ::grep .==0x4000
>

I can send you the two crash dumps. Please send me instructions where and how to upload. My e-mail address is youzhong@gmail.com.

youzhongyang commented 5 years ago

I saw OS-7344 is integrated, so I built an image using master branch of repos under github.com/joyent yesterday, no more, no less. Same issue. Quite easy to reproduce by running iperf client on the host (iperf3 -c "another host" -t 1800 -P 16) and iperf server (iperf3 -s -B "another host") on another host.

So basically, i40e driver is useless until this issue is resolved.

# cat /var/tmp/i40e-list
::walk mac_impl_cache | ::printf "%s %p\n" mac_impl_t mi_name mi_driver ! grep i40e | awk '{print $2}' > /var/tmp/i40e.lst

# cat /var/tmp/i40e-list | mdb -k

# cat /var/tmp/i40e-txs
::cat /var/tmp/i40e.lst | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%p %p %u %u %u %u (%u)\n" i40e_trqpair_t itrq_i40e . itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free

# cat /var/tmp/i40e-txs | mdb -k
fffffdf2c9d9b000 fffffdf20a964000 5687933 5690705 5690705 0 (1024)
fffffdf2c9d9b000 fffffdf20a964710 3120535 3120560 3120560 0 (1024)
fffffdf2c9d9b000 fffffdf20a964e20 4985 5017 5017 0 (1024)
fffffdf2c9d9b000 fffffdf20a965530 95239968 277808113 277808113 326 (1024)
fffffdf2c9d9b000 fffffdf20a965c40 4565532 4565538 4565538 0 (1024)
fffffdf2c9d9b000 fffffdf20a966350 7449 8269 8269 0 (1024)
fffffdf2c9d9b000 fffffdf20a966a60 4358 4399 4399 0 (1024)
fffffdf2caa4c000 fffffdf20a958000 2484296 2484314 2484314 0 (1024)
fffffdf2caa4c000 fffffdf20a958710 34645866 98649885 98649885 0 (1024)
fffffdf2caa4c000 fffffdf20a958e20 4714540 11211230 11211230 0 (1024)
fffffdf2caa4c000 fffffdf20a959530 48608042 138819214 138819214 0 (1024)
fffffdf2caa4c000 fffffdf20a959c40 63106918 189576319 189576319 0 (1024)
fffffdf2caa4c000 fffffdf20a95a350 2937732 2937780 2937780 0 (1024)
fffffdf2caa4c000 fffffdf20a95aa60 32439658 95111371 95111371 0 (1024)
fffffdf2caa4e000 fffffdf20a95c000 564478 1535659 1535659 0 (1024)
fffffdf2caa4e000 fffffdf20a95c710 2930820 8867793 8867792 0 (1023)
fffffdf2caa4e000 fffffdf20a95ce20 6225400 12537336 12537336 0 (1024)
fffffdf2caa4e000 fffffdf20a95d530 604061 605998 605998 0 (1024)
fffffdf2caa4e000 fffffdf20a95dc40 35619171 100410046 100409648 0 (626)
fffffdf2caa4e000 fffffdf20a95e350 132755070 389057393 389057393 0 (1024)
fffffdf2caa4e000 fffffdf20a95ea60 12656130 36785531 36785531 0 (1024)
fffffdf2c9279000 fffffdf20a960000 13908 13954 13954 0 (1024)
fffffdf2c9279000 fffffdf20a960710 1806059 5408581 5408581 0 (1024)
fffffdf2c9279000 fffffdf20a960e20 70727790 209387720 209387720 0 (1024)
fffffdf2c9279000 fffffdf20a961530 57567187 153155271 153155271 0 (1024)
fffffdf2c9279000 fffffdf20a961c40 7315959 20164293 20164293 0 (1024)
fffffdf2c9279000 fffffdf20a962350 22256940 62475907 62475907 0 (1024)
fffffdf2c9279000 fffffdf20a962a60 4625 5578 5578 0 (1024)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619181 100410056 100409648 0 (616)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619200 100410075 100409648 0 (597)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619227 100410102 100409648 0 (570)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619280 100410155 100409648 0 (517)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619343 100410218 100409648 0 (454)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619357 100410232 100409648 0 (440)

# cat /var/tmp/i40e-txs | mdb -k | grep -v 1024
fffffdf2caa4e000 fffffdf20a95dc40 35619427 100410302 100409648 0 (370)
goekesmi commented 5 years ago

Hello, I'm here to report that I seem to be having the same problem.

I'm running 20181121T181952Z which I see has the patch for OS-7344.

Following along with some of the above instructions, I'm seeing the following.

[root@headnode (us-elns-1) ~]# ssh root@172.20.47.52 -i /root/.ssh/sdc.id_rsa
- SmartOS (build: 20181121T181952Z)
You have new mail.
[root@AgentAsh (us-elns-1) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba xhci stmf_sbd stmf zfs sd mm lofs idm sata i40e crypto random cpc logindmux ptm sppp nsmb smbsrv nfs ]
> ::walk mac_impl_cache | ::printf "0x%p %s\n" mac_impl_t . mi_name ! grep i40e
0xfffffdf2c7b8b288 i40e1
0xfffffdf2c7b8e848 i40e0
> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
64 64 64 0 (1024)
0 0 0 0 (1024)
43 43 43 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)

I then run some 'fast' traffic through that interface, where fast is defined as something that might fill the buffer, rather than short interactive packets. Traffic stops abruptly and I start seeing this.

> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
68 68 68 0 (1024)
0 0 0 0 (1024)
2155 5698 5425 0 (751)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
68 68 68 0 (1024)
0 0 0 0 (1024)
2160 5711 5425 0 (738)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
68 68 68 0 (1024)
0 0 0 0 (1024)
2163 5722 5425 0 (727)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
68 68 68 0 (1024)
0 0 0 0 (1024)
2163 5722 5425 0 (727)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
> 0xfffffdf2c7b8b288::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
68 68 68 0 (1024)
0 0 0 0 (1024)
2164 5723 5425 0 (726)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)

It's not quite the same, because I'm not seeing failed TX attempts, but the free descriptors is falling.

I can provide dumps from this system. I'm looking to help diagnose what's going on and/or work arounds.

goekesmi commented 5 years ago

I realized after I posted the above note, that I should mention that I'm on X722 devices in the form of a dual port X557 integrated controller. Reading the above leads me to think I'm having a similar problem, but as I'm on different hardware, it may not be the same one.

    pci, instance #3
        pci8086,2032 (pciex8086,2032) [Intel Corporation Sky Lake-E PCI Express Root Port C], instance #3
            pci8086,37c0 (pciex8086,37c0) [Intel Corporation unknown device], instance #4
                pci8086,37c5 (pciex8086,37c5) [Intel Corporation unknown device], instance #5
                    pci15d9,37d2 (pciex8086,37d2) [Intel Corporation Ethernet Connection X722 for 10GBASE-T], instance #0
                    pci15d9,37d2 (pciex8086,37d2) [Intel Corporation Ethernet Connection X722 for 10GBASE-T], instance #1
rzezeski commented 5 years ago

I just wanted to leave an update that I do have a lead on this new freeze. It looks to be related to misuse of the NIC's data buffers/descriptors. I think we are a) creating buffers too small in size (< 17 bytes) and b) sometimes attaching too many descriptors to a TSO segment (> 8). According to the X710 manual either of these things would cause the Tx queue to stop. But this is still partially a hunch as I haven't tested/fixed it yet. I just wanted to give an update.

goekesmi commented 5 years ago

Well, I have a trivial to re-create problem on a machine I can easily get a dump off of. I've added a dual port intel gigabit card as a work around while we attempt to fix this problem. Let me know how I can help.

youzhongyang commented 5 years ago

It seems FreeBSD users are also suffering from the same TSO issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221919

rzezeski commented 5 years ago

@youzhongyang So I discovered one freeze in my testing but not the one you are hitting: OS-7457. I dug into your crash dump and found some very odd behavior but I haven't had time to do a write up yet. If you have the extra cycles it would be helpful to just confirm you can replicate the same issue with PI 20181121T182506Z just so we can absolutely rule out it's not a problem with your custom build.

@goekesmi You MIGHT be hitting the same issue as @youzhongyang but the only way for me to tell is to get a hold of a crash dump (created via an NMI such as running reboot -d). That or you can wait for me to further diagnose the crash dump I have in hand and I might have some mdb command you can follow to self diagnose.

I also plan to spend some time in the FreeBSD issue and see if there's anything relevant there. But what I saw in @youzhongyang's dump should have nothing to do with the driver (it's very odd behavior).

I appreciate everyone's patience as I iron out these lingering i40e bugs.

goekesmi commented 5 years ago

One crash dump coming right up.

- SmartOS (build: 20181121T181952Z)
You have new mail.
[root@AgentAsh (us-elns-1) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba xhci fctl stmf_sbd stmf zfs sd mm lofs idm sata i40e crypto fcp random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ]
> ::walk mac_impl_cache | ::printf "0x%p %s\n" mac_impl_t . mi_name ! grep i40e
0xfffffdf2c7c2c288 i40e1
0xfffffdf2c7c2f848 i40e0
> 0xfffffdf2c7c2f848::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
27874 27874 27874 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
3671 7302 6285 1036 (7)
981 1698 875 0 (201)
1310 3945 3565 0 (644)

I'm going to just assume, @rzezeski , that you have access to the JPC Manta on nonpublic objects, or can get access to it.

https://us-east.manta.joyent.com/goekesmi/stor/crashdumps/us-elns-1/AgentAsh.vmdump.2018-1219.i40e0

Enjoy?

rmustacc commented 5 years ago

@goekesmi To get access to it, we'll have to ask you to use the msign utility which will generate a pre-signed URL though which we can access it.

goekesmi commented 5 years ago

I'd rather not post that link in a public gitlab comment. Is there somewhere I can send that link to? Email? Signal? Support Ticket to Joyent?

rmustacc commented 5 years ago

Yeah, of course, you can either mail me (rm@) or @rzezeski (rpz@) and we'll make sure @rzezeski has it.

plitc commented 5 years ago
[root@assg9-labor ~]# uname -a
SunOS assg9-labor 5.11 joyent_20181218T025029Z i86pc i386 i86pc
[root@assg9-labor ~]#
[root@assg9-labor ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba mm fctl stmf_sbd stmf zfs sd lofs idm sata mpt_sas crypto random cpc fcp logindmux ptm kvm sppp nsmb smbsrv nfs i40e ]
>
> 0xfffffe5951ccc848::print mac_impl_t mi_driver | ::print i40e_t i40e_trqpairs | ::array i40e_trqpair_t 7 | ::printf "%u %u %u %u (%u)\n" i40e_trqpair_t itrq_txstat.itxs_packets.value.ui64 itrq_txstat.itxs_descriptors.value.ui64 itrq_txstat.itxs_recycled.value.ui64 itrq_txstat.itxs_err_nodescs.value.ui64 itrq_desc_free
2861 2861 2861 0 (1024)
2051468 6956816 6956816 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
0 0 0 0 (1024)
>

[root@assg9-labor ~]# zlogin fd7f7007-3ddd-c6f2-dade-f998404f08d4
root@fd7f7007-3ddd-c6f2-dade-f998404f08d4:~# iperf3 -s

iperf3: getsockopt - Protocol not available
[  5]  98.00-99.00  sec   111 MBytes   930 Mbits/sec
iperf3: getsockopt - Protocol not available
[  5]  99.00-100.00 sec   112 MBytes   942 Mbits/sec
iperf3: getsockopt - Protocol not available
[  5] 100.00-100.00 sec   238 KBytes   933 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-100.00 sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-100.00 sec  10.9 GBytes   939 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
root@fd7f7007-3ddd-c6f2-dade-f998404f08d4:~#

assg9-labor -> 10 Gbit -> Switch <-1 GBit-> Switch <- 10 Gbit <- assg15-labor

[root@assg15-labor ~]# uname -a
SunOS assg15-labor 5.11 joyent_20180526T113546Z i86pc i386 i86pc
[root@assg15-labor ~]#

[root@assg15-labor ~]# zlogin 7083c571-6c1d-41b7-dc62-84607e342bbc
root@7083c571-6c1d-41b7-dc62-84607e342bbc:~# iperf3 -c 172.16.7.201 -t 100

iperf3: getsockopt - Protocol not available
[  4]  98.00-99.00  sec   111 MBytes   932 Mbits/sec    0   3.44 MBytes
iperf3: getsockopt - Protocol not available
[  4]  99.00-100.00 sec   112 MBytes   940 Mbits/sec  280636382   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-100.00 sec  10.9 GBytes   939 Mbits/sec    0             sender
[  4]   0.00-100.00 sec  10.9 GBytes   939 Mbits/sec                  receiver

iperf Done.
root@7083c571-6c1d-41b7-dc62-84607e342bbc:~#
[root@assg9-labor ~]# sysinfo | grep i40e0
    "i40e0": {"MAC Address": "xx:xx:xx:xx:xx:xx", "ip4addr": "", "Link Status": "up", "NIC Names": ["labor10g"]},
[root@assg9-labor ~]#
rzezeski commented 5 years ago

@youzhongyang @goekesmi Just a heads up, I now have the proximate cause for both of your freezes and was able to replicate locally. I'm working on getting all the details into an OS ticket.

rzezeski commented 5 years ago

As promised: OS-7492.

rzezeski commented 5 years ago

Update: I have a solution that appears to work but it needs more polish and testing. I hope to get something out in the next week or so.

youzhongyang commented 5 years ago

@rzezeski We have a few test hosts with i40e cards, so I'd be more than happy to test the fix.

youzhongyang commented 5 years ago

@rzezeski I tested your patch for two days using our application load, it survived. Well done Ryan, and thanks for your efforts. Can we expect this fix will be available in next smartos release?

rzezeski commented 5 years ago

@youzhongyang Thank you for all your help and patience! It's truly appreciated during difficult bugs like this.

I think next release will be tough because I have family visiting and I want to make sure I test various edge cases before pushing this (and of course need to get through code review). But as long as nothing major comes up it should definitely ship by the first release in March.

rzezeski commented 5 years ago

Fixed in 0d3f2b61dcfb18edace4fd257054f6fdbe07c99c.