AWS ENA DPDK only able to use one core with igb_uio

norg commented 4 years ago

We're trying to achieve 10Gbit/s in a big AWS instance with the ENA interfaces but so far we can't get it running ofer 2.5Gbit/s due to the fact that for each interface just one core can be used:

./t-rex-64 -f traffic.yaml -c 10 -m 6 -d 100 --cfg /etc/trex_cfg.yaml 
Trying to bind to igb_uio ...
/usr/bin/python3 dpdk_nic_bind.py --bind=igb_uio 0000:00:06.0 0000:00:07.0 
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 STF mode does not support more than one core by dual interfaces
try with setting the number of cores to 1 (-c 1)

When I run it with just -c 1 I see the DPDK message:

Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to ONE_QUE 
 Number of ports found: 2
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
 wait 1 sec .
port : 0 
------------
link         :  link : Link Up - speed 50000 Mbps - full-duplex
promiscuous  : 0 
port : 1 
------------
...

And it's running but with just one core for each interface it reaches a cap.

./dpdk_nic_bind.py -s

Network devices using DPDK-compatible driver
============================================
0000:00:06.0 'Elastic Network Adapter (ENA)' drv=igb_uio unused=ena
0000:00:07.0 'Elastic Network Adapter (ENA)' drv=igb_uio unused=ena

Network devices using kernel driver
===================================
0000:00:05.0 'Elastic Network Adapter (ENA)' if=ens5 drv=ena unused=igb_uio *Active*

So I'm wondering is it not an option to get it running like that in AWS like I have it on some baremetal machines? There I see DPDK mode being set to DROP_QUE_FILTER on X710 Intel Nics.

What made me wonder as well is that the DPDK mode is missing at all when I run -c 10. Is this rather a DPDK issue with the ena/igb_uio module?

So any hint would be helpful

hhaim commented 4 years ago

Every mode has a different requirements from The DPDK driver. Please try STL or ASTF mode

norg commented 4 years ago

Thanks for the quick response Hanoh.

I know that, but we kinda rely on STF mode. I can see if I can migrate some parts to STL/ASTF of testing. But would be interesting to know why this limitation exists.The ENA NIC does support several queues but ONE_QUE sounds like just one queue being able to be used?

Or is this something we would have to ask the DPDK folks that take care of the DPDK driver for ENA?

hhaim commented 4 years ago

STF mode is rather old and wasn’t progressed like STL and ASTF to support “full software mode”. There is an requirement to configure 2 Rx queues one as drop and another for latency and hw rules for accurate latency — this is not supported in ENA.

It is theoretically possible to add this support

norg commented 4 years ago

So what would you suggest if someone wants to mimic the sfr2.yaml traffic for example in another mode? I guess that only makes sense in ASTF when I need the state since it's Suricata I want to stress which is an IDS and thus needs ideally the full flow, right? So the astf/sfr_full.py might be a good starting point, I guess.

I assumed that STF mode is more or less the default mode and way to go with all the provided examples in the scripts/cap2 folder :)

And I would have expected that ASTF might have the same limitation with ENA and thus not using the full potential of a 20/40 cpu core machine in AWS.

I did test ./t-rex-64 --astf -f astf/sfr_full.py -c 10 -m 6 -d 100 --cfg /etc/trex_cfg.yaml but in the end it's the same issue:

converting astf profile astf/sfr_full.py to json /tmp/astf.json
Trying to bind to igb_uio ...
/usr/bin/python3 dpdk_nic_bind.py --bind=igb_uio 0000:00:06.0 0000:00:07.0 
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 STF_BATCH mode does not support more than one core by dual interfaces
try with setting the number of cores to 1 (-c 1)

hhaim commented 4 years ago

@norg

Yes, this is the ASTF profile, it is just a different format (python) and you get much more features (full TCP/UDP counters in case of errors) in the cost of CPU% resource :-) ASTF requires RSS driver support for working in a multi-queue, I think ENA should support it (ENA is not part of our regression) -- this is part of the code but I didn't verify it

dev_info->flow_type_rss_offloads = ETH_RSS_IP | ETH_RSS_TCP |
                       ETH_RSS_UDP;

If this does not work, you could scale by adding more virtual interfaces and have one core per dual virtual interfaces.

STL requires even less from the driver and should almost always work in software mode.

Try doing this (interactive mode)

./t-rex-64 --astf -i -c 10- --cfg /etc/trex_cfg.yaml

and then connect from another terminal and connect using our console see here https://trex-tgn.cisco.com/trex/doc/trex_astf.html#_tutorial_run_trex_with_simple_http_profile

norg commented 4 years ago

I would have thought that ENA does support it, but somehow it doesn't in that case. I guess I will have to talk to the DPDK folks about that. With ethtool I can see it supports 32 queues which would be enough.

We tried more virtual interfaces which worked with 4 instead of 2 but with 6 it didn't. The Intel NICs at AWS work better but lack the traffic mirroring feature.

I guess we have to spawn more ec2 instances to generate a load of 10Gbit/s for now :/

hhaim commented 4 years ago

@norg could you share the error?

norg commented 4 years ago

Sure:

The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to ONE_QUE 
 Number of ports found: 6
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)

hhaim commented 4 years ago

I meant multi-queue e.g. 2 cores per dual virtual interface. ENA supports RSS.

norg commented 4 years ago

That case was with 6 ENA NICs but maybe you meant something else by virtual interfaces? So no additional virtual NICs?

So might be a misunderstanding :)

hhaim commented 4 years ago

You should be able to work with multiples Rx/Tx queues with ENA/ASTF. Could you run One TRex with only 2 ENA interfaces with more queues/cores by adding -c 2

norg commented 4 years ago

That's what I already tried but I did again, with -f and with -i interactive:

./t-rex-64 --astf -f astf/sfr_single.py -c 2
converting astf profile astf/sfr_single.py to json /tmp/astf.json
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 STF_BATCH mode does not support more than one core by dual interfaces
try with setting the number of cores to 1 (-c 1)

 ./t-rex-64 --astf -i -c 2
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to MULTI_QUE 
 Number of ports found: 2
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
Requested RX offload TCP_LRO is not supported

I see the diff with interactive mode and TCP_LRO is not checked in https://trex-tgn.cisco.com/trex/doc/trex_manual.html#_hardware_recommendations so that's expected.

hhaim commented 4 years ago

Try adding this to the CLI --lro-disable For interactive mode

norg commented 4 years ago

That helped. So -c 2 works but higher numbers like 4,6,8 don't and over 8 will complain with no support but at least that makes sense due to the 8 queue limit on the ENA nic:

ERROR: driver maximum tx queues is (8) required (10) reduce number of cores to support it

With 4 I get:

./t-rex-64 --astf -i -c 4 --lro-disable
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to MULTI_QUE 
 Number of ports found: 2
Warning LRO is supported and asked to be disabled by user 
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)

I guess if we can reach 8 that should be enough to reach the 10Gbit/s goal.

hhaim commented 4 years ago

@norg we are making a progress. Try to add “-v 8 “ to CLI . This will enable debug

I think the default number of mbuf is low for virtual driver try to add more descriptors and mbufs (2k)

norg commented 4 years ago

I did not yet add the additional descriptors, will test this tomorrow but I will already paste you the debug info:

Starting  TRex v2.81 please wait  ... 
Using configuration file /etc/trex_cfg.yaml 
 port limit     :  not configured 
 port_bandwidth_gb    :  10 
 if_mask        : None 
 is low-end : 0 
 stack type :  
 thread_per_dual_if      : 1 
 if        :  00:06.0, 00:07.0,
 enable_zmq_pub :  1 
 zmq_pub_port   :  4500 
 m_zmq_rpc_port    :  4501 
 src     : XX
 dest    : XX
 src     : XX
 dest    : XX
 memory per 2x10G ports  
 MBUF_64                                   : 16380 
 MBUF_128                                  : 8190 
 MBUF_256                                  : 8190 
 MBUF_512                                  : 8190 
 MBUF_1024                                 : 8190 
 MBUF_2048                                 : 4095 
 MBUF_4096                                 : 128 
 MBUF_9K                                   : 512 
 TRAFFIC_MBUF_64                           : 65520 
 TRAFFIC_MBUF_128                          : 32760 
 TRAFFIC_MBUF_256                          : 8190 
 TRAFFIC_MBUF_512                          : 8190 
 TRAFFIC_MBUF_1024                         : 8190 
 TRAFFIC_MBUF_2048                         : 32760 
 TRAFFIC_MBUF_4096                         : 128 
 TRAFFIC_MBUF_9K                           : 512 
 MBUF_DP_FLOWS                             : 524288 
 MBUF_GLOBAL_FLOWS                         : 5120 
 master   thread  : 0  
 rx  thread  : 1  
 dual_if : 0 
    socket  : 0  
   [   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18     ]  
 dual_if : 1 
    socket  : 0  
   [   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35     ]  
CTimerWheelYamlInfo does not exist  
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to MULTI_QUE 
 Number of ports found: 2
Warning LRO is supported and asked to be disabled by user 
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
Ethdev port_id=0 nb_tx_queues=10 > 8
EAL: Error - exiting with code: 1
  Cause: Cannot configure device: err=-22, port=0

With 4 instead of 8:

./t-rex-64 --astf -i -c 4 --lro-disable -v 8
The ports are bound/configured.
Starting  TRex v2.81 please wait  ... 
Using configuration file /etc/trex_cfg.yaml 
 port limit     :  not configured 
 port_bandwidth_gb    :  10 
 if_mask        : None 
 is low-end : 0 
 stack type :  
 thread_per_dual_if      : 1 
 if        :  00:06.0, 00:07.0,
 enable_zmq_pub :  1 
 zmq_pub_port   :  4500 
 m_zmq_rpc_port    :  4501 
 src     : XX
 dest    :XX
 src     : XX
 dest    : XX
 memory per 2x10G ports  
 MBUF_64                                   : 16380 
 MBUF_128                                  : 8190 
 MBUF_256                                  : 8190 
 MBUF_512                                  : 8190 
 MBUF_1024                                 : 8190 
 MBUF_2048                                 : 4095 
 MBUF_4096                                 : 128 
 MBUF_9K                                   : 512 
 TRAFFIC_MBUF_64                           : 65520 
 TRAFFIC_MBUF_128                          : 32760 
 TRAFFIC_MBUF_256                          : 8190 
 TRAFFIC_MBUF_512                          : 8190 
 TRAFFIC_MBUF_1024                         : 8190 
 TRAFFIC_MBUF_2048                         : 32760 
 TRAFFIC_MBUF_4096                         : 128 
 TRAFFIC_MBUF_9K                           : 512 
 MBUF_DP_FLOWS                             : 524288 
 MBUF_GLOBAL_FLOWS                         : 5120 
 master   thread  : 0  
 rx  thread  : 1  
 dual_if : 0 
    socket  : 0  
   [   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18     ]  
 dual_if : 1 
    socket  : 0  
   [   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35     ]  
CTimerWheelYamlInfo does not exist  
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to MULTI_QUE 
 Number of ports found: 2
Warning LRO is supported and asked to be disabled by user 
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
EAL: Error - exiting with code: 1
  Cause: rte_eth_dev_start: err=-14, port=1

With 6 it's the same as with 4 but queue 5 instead of queue 0 and with 6 as you can see we end up with 10 > 8 on tx which is kinda strange for me right now since I get the complaint with the queues limit of 8 when I use -c 10.

I will play with mbuf settings tomorrow to see if this helps in the 4/6 scenario or maybe even in the one with -c 8.

Thanks a lot so far!

hhaim commented 4 years ago

@norg I think I see the issue. ENA does not support scatter/gather and your 9K mbuf size is very low (trex assumes it support)

You can apply this patch to solve this: https://github.com/hhaim/trex-core/commit/4d41b3493a81d5c961d605aa42e56cc5c370f8be

Let me know Another option is just to add more 9k buffers, but this is not the right solution, it is a workaround

BTW LRU is enabled by mistake too, this should be fixed too

tielou commented 4 years ago

Hi @hhaim Since @norg is busy this morning I continued his work and tried out your patch. So what I did is, I cloned the latest trex-core, patched the file and followed the steps there https://github.com/cisco-system-traffic-generator/trex-core/wiki#how-to-build-trex Unfortunately we're still running into ena_queue_start error when starting trex in astf mode with more than 2 CPU's. Is there any way I can confirm that the new driver was loaded actually? Or is there something I might have forgotten / done wrong? Thank you for your help!

hhaim commented 4 years ago

Please send the output of running the server with ./t-rex-64 --astf -c 3 --lro-disable -v 7 -i I would like to see how many 9k buffers you have. You can set a debugger in driver function and check in which line it fails

I think it is this line


    rc = rte_mempool_get_bulk(rxq->mb_pool, (void **)mbufs, count);
    if (unlikely(rc < 0)) {
        rte_atomic64_inc(&rxq->adapter->drv_stats->rx_nombuf);
        ++rxq->rx_stats.mbuf_alloc_fail;
        PMD_RX_LOG(DEBUG, "there are no enough free buffers");   <<< I assume it fails here convert this to ERROR to see this
        return 0;
    }

Another option is to add more mbuf to 9k, add this to trex_cfg.yaml

    memory    :                                           
             mbuf_9k   : 12228
             traffic_mbuf_9k   : 12228

tielou commented 4 years ago

I increased the mbuf_9k a bit before as well but that didn't change much. Here's the outcome of ./t-rex-64 --astf -c 3 --lro-disable -v 7 -I:

Using configuration file /etc/trex_cfg.yaml
 port limit     :  not configured
 port_bandwidth_gb    :  10
 if_mask        : None
 is low-end : 0
 stack type :
 thread_per_dual_if      : 1
 if        :  00:06.0, 00:07.0,
 enable_zmq_pub :  1
 zmq_pub_port   :  4500
 m_zmq_rpc_port    :  4501
 src     : 02:13:a8:c6:f6:e4
 dest    : 02:1b:49:1b:50:a0
 src     : 02:1b:49:1b:50:a0
 dest    : 02:13:a8:c6:f6:e4
 memory per 2x10G ports
 MBUF_64                                   : 16380
 MBUF_128                                  : 8190
 MBUF_256                                  : 8190
 MBUF_512                                  : 8190
 MBUF_1024                                 : 8190
 MBUF_2048                                 : 4095
 MBUF_4096                                 : 128
 MBUF_9K                                   : 8190
 TRAFFIC_MBUF_64                           : 65520
 TRAFFIC_MBUF_128                          : 32760
 TRAFFIC_MBUF_256                          : 8190
 TRAFFIC_MBUF_512                          : 8190
 TRAFFIC_MBUF_1024                         : 8190
 TRAFFIC_MBUF_2048                         : 32760
 TRAFFIC_MBUF_4096                         : 128
 TRAFFIC_MBUF_9K                           : 8190
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 master   thread  : 0
 rx  thread  : 1
 dual_if : 0
    socket  : 0
   [   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35     ]
CTimerWheelYamlInfo does not exist
 flags           : 8030f00
 write_file      : 0
 verbose         : 7
 realtime        : 1
 flip            : 0
 cores           : 3
 single core     : 0
 flow-flip       : 0
 no clean close  : 0
 zmq_publish     : 1
 vlan mode       : 0
 client_cfg      : 0
 mbuf_cache_disable  : 0
 cfg file        :
 mac file        :
 out file        :
 client cfg file :
 duration        : 0
 factor          : 1
 mbuf_factor     : 1
 latency         : 0 pkt/sec
 zmq_port        : 4500
 telnet_port     : 4501
 expected_ports  : 2
 tw_bucket_usec  : 20.000000 usec
 tw_buckets      : 1024 usec
 tw_levels       : 3 usec
 port : 0 dst:02:1b:49:1b:50:a0  src:02:13:a8:c6:f6:e4
 port : 1 dst:02:13:a8:c6:f6:e4  src:02:1b:49:1b:50:a0
 port : 2 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 3 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 4 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 5 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 6 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 7 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 8 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 9 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 10 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 11 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 12 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 13 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 14 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 15 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 16 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 17 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 18 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 19 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 20 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 21 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 22 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 23 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 Total Memory :
 MBUF_64                                   : 81900
 MBUF_128                                  : 40950
 MBUF_256                                  : 16380
 MBUF_512                                  : 16380
 MBUF_1024                                 : 16380
 MBUF_2048                                 : 36855
 MBUF_4096                                 : 3072
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 get_each_core_dp_flows                    : 174762
 Total memory                              :     256.78 Mbytes
 core_list : 0,1,2,3,4
 sockets : 0
 active sockets : 1
 ports_sockets : 1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 phy   |   virt
 2      1
 3      2
 4      3
DPDK args
 xx  -l  0,1,2,3,4  -n  4  --log-level  8  --master-lcore  0  -w  0000:00:06.0  -w  0000:00:07.0  --legacy-mem
EAL: Detected 36 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
 EAL: Probing VFIO support...
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL: PCI device 0000:00:07.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
                input : [00:06.0, 00:07.0]
                 dpdk : [0000:00:06.0, 0000:00:07.0]
             pci_scan : [0000:00:06.0, 0000:00:07.0]
                  map : [ 0, 1]
 TRex port mapping
 -----------------
 TRex vport: 0 dpdk_rte_eth: 0
 TRex vport: 1 dpdk_rte_eth: 1
 set driver name net_ena
 driver capability  : TCP_UDP_OFFLOAD
 set dpdk queues mode to MULTI_QUE
 DPDK devices 2 : 2
-----
 0 : vdev 0000:00:06.0
 1 : vdev 0000:00:07.0
-----
 Number of ports found: 2

if_index : 0
driver name : net_ena
min_rx_bufsize : 64
max_rx_pktlen  : 9216
max_rx_queues  : 32
max_tx_queues  : 32
max_mac_addrs  : 1
rx_offload_capa : 0x80e
tx_offload_capa : 0x800e
rss reta_size   : 128
flow_type_rss   : 0x3afbc
tx_desc_max     : 1024
tx_desc_min     : 128
rx_desc_max     : 8192
rx_desc_min     : 128
Warning LRO is supported and asked to be disabled by user
WARNING: reduce max packet len from 9238 to 9216
zmq publisher at: tcp://*:4500
 rx_data_q_num : 0
 rx_drop_q_num : 0
 rx_dp_q_num   : 3
 rx_que_total : 3
 --
 rx_desc_num_data_q   : 512
 rx_desc_num_drop_q   : 4096
 rx_desc_num_dp_q     : 512
 total_desc           : 1536
 --
 tx_desc_num     : 1024
port 0 desc: Unknown
 rx_qid: 0 (512)
 rx_qid: 1 (512)
 rx_qid: 2 (512)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 2 type(1)

And with increased mbuf:

Starting  TRex v2.82 please wait  ...
Using configuration file /etc/trex_cfg.yaml
 port limit     :  not configured
 port_bandwidth_gb    :  10
 if_mask        : None
 is low-end : 0
 stack type :
 thread_per_dual_if      : 1
 if        :  00:06.0, 00:07.0,
 enable_zmq_pub :  1
 zmq_pub_port   :  4500
 m_zmq_rpc_port    :  4501
 src     : 02:13:a8:c6:f6:e4
 dest    : 02:1b:49:1b:50:a0
 src     : 02:1b:49:1b:50:a0
 dest    : 02:13:a8:c6:f6:e4
 memory per 2x10G ports
 MBUF_64                                   : 16380
 MBUF_128                                  : 8190
 MBUF_256                                  : 8190
 MBUF_512                                  : 8190
 MBUF_1024                                 : 8190
 MBUF_2048                                 : 4095
 MBUF_4096                                 : 128
 MBUF_9K                                   : 12228
 TRAFFIC_MBUF_64                           : 65520
 TRAFFIC_MBUF_128                          : 32760
 TRAFFIC_MBUF_256                          : 8190
 TRAFFIC_MBUF_512                          : 8190
 TRAFFIC_MBUF_1024                         : 8190
 TRAFFIC_MBUF_2048                         : 32760
 TRAFFIC_MBUF_4096                         : 128
 TRAFFIC_MBUF_9K                           : 12228
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 master   thread  : 0
 rx  thread  : 1
 dual_if : 0
    socket  : 0
   [   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35     ]
CTimerWheelYamlInfo does not exist
 flags           : 8030f00
 write_file      : 0
 verbose         : 7
 realtime        : 1
 flip            : 0
 cores           : 3
 single core     : 0
 flow-flip       : 0
 no clean close  : 0
 zmq_publish     : 1
 vlan mode       : 0
 client_cfg      : 0
 mbuf_cache_disable  : 0
 cfg file        :
 mac file        :
 out file        :
 client cfg file :
 duration        : 0
 factor          : 1
 mbuf_factor     : 1
 latency         : 0 pkt/sec
 zmq_port        : 4500
 telnet_port     : 4501
 expected_ports  : 2
 tw_bucket_usec  : 20.000000 usec
 tw_buckets      : 1024 usec
 tw_levels       : 3 usec
 port : 0 dst:02:1b:49:1b:50:a0  src:02:13:a8:c6:f6:e4
 port : 1 dst:02:13:a8:c6:f6:e4  src:02:1b:49:1b:50:a0
 port : 2 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 3 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 4 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 5 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 6 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 7 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 8 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 9 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 10 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 11 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 12 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 13 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 14 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 15 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 16 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 17 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 18 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 19 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 20 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 21 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 22 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 port : 23 dst:00:00:00:01:00:00  src:00:00:00:00:00:00
 Total Memory :
 MBUF_64                                   : 81900
 MBUF_128                                  : 40950
 MBUF_256                                  : 16380
 MBUF_512                                  : 16380
 MBUF_1024                                 : 16380
 MBUF_2048                                 : 36855
 MBUF_4096                                 : 3072
 MBUF_DP_FLOWS                             : 524288
 MBUF_GLOBAL_FLOWS                         : 5120
 get_each_core_dp_flows                    : 174762
 Total memory                              :     256.78 Mbytes
 core_list : 0,1,2,3,4
 sockets : 0
 active sockets : 1
 ports_sockets : 1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 phy   |   virt
 2      1
 3      2
 4      3
DPDK args
 xx  -l  0,1,2,3,4  -n  4  --log-level  8  --master-lcore  0  -w  0000:00:06.0  -w  0000:00:07.0  --legacy-mem
EAL: Detected 36 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
 EAL: Probing VFIO support...
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL: PCI device 0000:00:07.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
                input : [00:06.0, 00:07.0]
                 dpdk : [0000:00:06.0, 0000:00:07.0]
             pci_scan : [0000:00:06.0, 0000:00:07.0]
                  map : [ 0, 1]
 TRex port mapping
 -----------------
 TRex vport: 0 dpdk_rte_eth: 0
 TRex vport: 1 dpdk_rte_eth: 1
 set driver name net_ena
 driver capability  : TCP_UDP_OFFLOAD
 set dpdk queues mode to MULTI_QUE
 DPDK devices 2 : 2
-----
 0 : vdev 0000:00:06.0
 1 : vdev 0000:00:07.0
-----
 Number of ports found: 2

if_index : 0
driver name : net_ena
min_rx_bufsize : 64
max_rx_pktlen  : 9216
max_rx_queues  : 32
max_tx_queues  : 32
max_mac_addrs  : 1
rx_offload_capa : 0x80e
tx_offload_capa : 0x800e
rss reta_size   : 128
flow_type_rss   : 0x3afbc
tx_desc_max     : 1024
tx_desc_min     : 128
rx_desc_max     : 8192
rx_desc_min     : 128
Warning LRO is supported and asked to be disabled by user
WARNING: reduce max packet len from 9238 to 9216
zmq publisher at: tcp://*:4500
 rx_data_q_num : 0
 rx_drop_q_num : 0
 rx_dp_q_num   : 3
 rx_que_total : 3
 --
 rx_desc_num_data_q   : 512
 rx_desc_num_drop_q   : 4096
 rx_desc_num_dp_q     : 512
 total_desc           : 1536
 --
 tx_desc_num     : 1024
port 0 desc: Unknown
 rx_qid: 0 (512)
 rx_qid: 1 (512)
 rx_qid: 2 (512)
port 1 desc: Unknown
 rx_qid: 0 (512)
 rx_qid: 1 (512)
 rx_qid: 2 (512)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
EAL: Error - exiting with code: 1
  Cause: rte_eth_dev_start: err=-14, port=1

I'll put in the debugger line next and get back to you.

hhaim commented 4 years ago

@tielou thanks for the logs. I will need to put my hand on a setup to understand why the driver is complaining. Will update

hhaim commented 4 years ago

@norg @tielou The solution is simple, v2.81 can work out of the box (I've tested it with v2.82)

just add more rx/tx desc e.g. 1024 or 2048 will do (the default is 512)

Another thing, set the port_bw to 50 instead of 10 to have more mbufs

see this example


- port_limit      : 2
  version         : 2
#List of interfaces. Change to suit your setup. Use ./dpdk_setup_ports.py -s to see available options
  interfaces    : ["00:06.0","00:07.0"]
  port_bandwidth_gb : 50
  rx_desc : 1024
  tx_desc : 1024
  port_info       :  # Port IPs. Change to suit your needs. In case of loopback, you can leave as is.
          - ip         : x.x.x.x
            default_gw : x.x.x.x
          - ip         : x.x.x.x
            default_gw : x.x.x.x.x

  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0
          threads: [2,3,4,5,6,7,8,9]

../Running it

$ sudo ./t-rex-64 -i --astf --lro-disable -c 4 --iom 0
The ports are bound/configured.
Starting  TRex v2.82 please wait  ... 
 set driver name net_ena 
 driver capability  : TCP_UDP_OFFLOAD 
 set dpdk queues mode to MULTI_QUE 
 Number of ports found: 2
Warning LRO is supported and asked to be disabled by user 
WARNING: reduce max packet len from 9238 to 9216 
zmq publisher at: tcp://*:4500
 wait 1 sec .
port : 0 
------------
link         :  link : Link Up - speed 50000 Mbps - full-duplex
promiscuous  : 0 
port : 1 
------------
link         :  link : Link Up - speed 50000 Mbps - full-duplex
promiscuous  : 0 
 number of ports         : 2 
 max cores for 2 ports   : 4 
 tx queues per port      : 6 
 -------------------------------
RX core uses TX queue number 65535 on all ports
 core, c-port, c-queue, s-port, s-queue, lat-queue
 ------------------------------------------
 1        0      0       1       0      0  
 2        0      1       1       1    255  
 3        0      2       1       2    255  
 4        0      3       1       3    255  
 -------------------------------

However, running a simple ASTF profile (http simple) shows OOO packets need to look into the DUT

hhaim commented 4 years ago

Looking into this I see some issues

RSS does not work correctly and in some cases the wrong packet goes to wrong core and return RST packet -- with one core it does not happen.
There are many TCP OOO packets -- not sure from where

hhaim commented 4 years ago

Looking into the code further, it seems the function supported by ENA is CRC32 and can't change to TOEPLITZ (explains why RSS does not work)

This is the changes I did. In short until there would be a way to change ENA to use TOEPLITZ ASTF will not work

--- a/src/dpdk/drivers/net/ena/ena_ethdev.c
+++ b/src/dpdk/drivers/net/ena/ena_ethdev.c
@@ -619,7 +619,7 @@ static int ena_rss_init_default(struct ena_adapter *adapter)
                }
        }

-       rc = ena_com_fill_hash_function(ena_dev, ENA_ADMIN_CRC32, NULL,
+       rc = ena_com_fill_hash_function(ena_dev, ENA_ADMIN_TOEPLITZ, NULL,
                                        ENA_HASH_KEY_SIZE, 0xFFFFFFFF);
        if (unlikely(rc && (rc != ENA_COM_UNSUPPORTED))) {
                PMD_DRV_LOG(INFO, "Cannot fill hash function\n");
diff --git a/src/drivers/trex_driver_virtual.cpp b/src/drivers/trex_driver_virtual.cpp
index 1180ee8a..a773b540 100755
--- a/src/drivers/trex_driver_virtual.cpp
+++ b/src/drivers/trex_driver_virtual.cpp
@@ -112,7 +112,7 @@ void CTRexExtendedDriverMlnx4::update_configuration(port_cfg_t * cfg) {

 CTRexExtendedDriverVirtio::CTRexExtendedDriverVirtio() {
-    m_cap = tdCAP_ONE_QUE | tdCAP_MULTI_QUE;
+    m_cap = tdCAP_ONE_QUE | tdCAP_MULTI_QUE | tdCAP_RSS_DROP_QUE_FILTER;
 }

 void CTRexExtendedDriverVirtio::update_configuration(port_cfg_t * cfg) {
diff --git a/src/main_dpdk.cpp b/src/main_dpdk.cpp
index e62b30b5..ab9f924d 100644
--- a/src/main_dpdk.cpp
+++ b/src/main_dpdk.cpp
@@ -5429,7 +5429,8 @@ COLD_FUNC void CPhyEthIF::conf_hardware_astf_rss() {
         hash_key_size = dev_info->hash_key_size;
     }

-    if (!rte_eth_dev_filter_supported(m_repid, RTE_ETH_FILTER_HASH)) {
+    if (1 /*!rte_eth_dev_filter_supported(m_repid, RTE_ETH_FILTER_HASH)*/) {

hhaim commented 4 years ago

I've googled it and found this (it actually toeplitz but with the wrong key)

https://github.com/scylladb/seastar/issues/654

tielou commented 4 years ago

Hi @hhaim , thank you very much for all your research. I can confirm setting: port_bandwidth_gb : 50, rx_desc : 1024 and tx_desc : 1024 allows us to use multithreading :)

I will give the driver patch a shot later!

hhaim commented 4 years ago

Hi @tielou, you didn't read it carefully. :-) The fact that it loads does not mean that it works

I see the ENA driver issue, They support toeplitz but ignore the key from rx_adv_conf.rss_conf.rss_key I will look into latest driver to see if this issue was solved! for now ASTF multi-queue will not work
There is a TCP out of order OOO even with one core -- even with very low rate not sure why ? can you check it in your case?

hhaim commented 4 years ago

ENA driver does not support it even on latest version

norg commented 4 years ago

Does this only affect ASTF but also STF? We have STF running right now with a lower bandwith and use several instances to achieve the overall goal of 10Gbit/s traffic mix.

(we will still look into ASTF to solve the overall issue and ideally end up with one machine doing the fancy advanced traffic generating)

hhaim commented 4 years ago

Does this only affect ASTF but also STF? We have STF running right now with a lower bandwith and use several instances to achieve the overall goal of 10Gbit/s traffic mix.

(we will still look into ASTF to solve the overall issue and ideally end up with one machine doing the fancy advanced traffic generating)

Only ASTF, STL works fine as there is NO need for specific distributions (RSS) in this mode. Again, I think in general you should move to ASTF as it has much more feedback information. OOO will not be noticed in STF and has much better API support.

I've looked into the ENA code more it seems that this (giving the right key,and key_len) it fails in a new location

int ena_com_set_hash_function(struct ena_com_dev *ena_dev)
{
    struct ena_com_admin_queue *admin_queue = &ena_dev->admin_queue;
    struct ena_rss *rss = &ena_dev->rss;
    struct ena_admin_set_feat_cmd cmd;
    struct ena_admin_set_feat_resp resp;
    struct ena_admin_get_feat_resp get_resp;
    int ret;

    if (!ena_com_check_supported_feature_id(ena_dev,
                        ENA_ADMIN_RSS_HASH_FUNCTION)) {
        ena_trc_dbg("Feature %d isn't supported\n",
                ENA_ADMIN_RSS_HASH_FUNCTION);
        return ENA_COM_UNSUPPORTED; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< RSS NOT SUPPORTED 
    }

    /* Validate hash function is supported */
    ret = ena_com_get_feature(ena_dev, &get_resp,
                  ENA_ADMIN_RSS_HASH_FUNCTION, 0);
    if (unlikely(ret))
        return ret;

    if (!(get_resp.u.flow_hash_func.supported_func & BIT(rss->hash_func))) {
        ena_trc_err("Func hash %d isn't supported by device, abort\n",
                rss->hash_func);
        return ENA_COM_UNSUPPORTED;
    }

priikone commented 3 years ago

Looks like I might be facing the same issue with rampup HTTP test with more than one CPU core. Works fine with single CPU but not at all with two or more. Any timetable when the DPDK is going to be updated in Trex? Shouldn't the new ena in 20.11 fix the issue, if I understood correctly?

hhaim commented 3 years ago

@priikone DPDK AWS driver development is super slow and does not support full RSS API. So in the foreseeable future (see issue in ENA DPDK driver) I don't see a solution from this front.
We are working on a trex software solution for that,it will work for all the driver that has a special distributions like KVM-VIRTIO and AWS that does not support the DPDK RSS API. ETA is a week or two.

priikone commented 3 years ago

Thanks for the info. I see it now they didn't change anything wrt. ena RSS in 20.11. Out of curiosity, why does Trex need to set the RSS key and reta itself? Wouldn't this work by default without it as well? Drivers typically set the default RSS to the configured queues automatically. Unless Trex have the need to redirect to some specific core(s). My need is just to be able send and receive traffic on as many cores as possible. I don't care about latency measurements for example at all in this type of setup.

hhaim commented 3 years ago

@priikone when a flow is generated from core x (ASTF) it is required to know which tuple to generate so the reverse flow will come to the same core x. Without knowing the key and the distribution model there is a need to have a software rings between the cores to redirect the packets (using software -slow) back to the right generated core. Another alternative is to move the flow context to the cores that was chosen by the distribution model

priikone commented 3 years ago

Understood, thanks. This is of course a common problem in networking and there's many ways it can be handled. But I'm not familiar with Trex architecture and what limitations there are.

What's the plan with the software solution? How is it going to work?

bdollma commented 3 years ago

@priikone The idea is that when the source port is generated, the last byte is aligned to the core id. Hence, when a packet returns in the opposite flow tuple, from the destination port we can figure out to which core this packet is dedicated. We are implementing a ring in between data planes in order to redirect these packets to the correct core.

priikone commented 3 years ago

That's probably fine with the packet rates in the rampup HTTP test I'm thinking...

Looks like you want to keep everything per-thread. Have you considered doing a flow lookup (lockless of course) so it wouldn't matter where the traffic arrives? Or move the flow once it's known where it balances. In principal it would be possible to even precompute the RSS hash before hand to know where the return traffic is going to balance. Again, I'm not familiar with Trex architecture so I have no clue what's easiest to do there...

hhaim commented 3 years ago

@priikone calculating the reverse flow destination is not possible when you don't know how the RSS function works. This is the all point of making a software redirect. As I said earlier, moving the flow-context (instead of the packets) would be much faster as it happen once in a flow, similar to tcp syn-cookies idea, but it would complicate the code for a less important use-cases. Our main use-case is hardware (CX-5, I40E) @ high scale of flow/throughput. The same for making the table global which will impact our main use-case (taking spin-lock for every packet) of high performance/high scale on hardware devices.

bdollma commented 3 years ago

Hi @priikone , We are in the final stages of implementing the software RSS. After some more reviews it will eventually be merged in the next version, but if you can't wait you can take a look at this branch https://github.com/bdollma/trex-core/tree/software_rss The slowdown in software mode shouldn't be that bad, our first guess is between 10-15% for the simple http profile. The latency might have a bigger impact. Thanks,

priikone commented 3 years ago

It works! Thanks guys. Didn't do much performance testing yet, though.

One issue is that I can't use all 8 queues. If I try (-c 8), I get:

Ethdev port_id=0 nb_tx_queues=10 > 8 EAL: Error - exiting with code: 1 Cause: Cannot configure device: err=-22, port=0

WIth 6 queues or fewer it works fine.

hhaim commented 3 years ago

@priikone the code was released. You can't use 8 cores because the AWS driver is limited to 8 TX queues and we need another 2 queues

priikone commented 3 years ago

Hmm... is that for the Tx side? Or is there a reason why 8 Rx queues couldn't exist? Or what are they used for?

hhaim commented 3 years ago

@priikone when you specify -c 6 it means we are using 6 cores for DP (tx+rx) but we have more queues used for latency

priikone commented 3 years ago

Ok, so an improvement could be to utilize more queues when latency measurement isn't used.

hhaim commented 3 years ago

@priikone we had this in the past and removed this optimization beacuse the init code became too complex. add more queues in AWS

priikone commented 3 years ago

With 8 cores (10 queues) everything seems to work fine, but >8 cores (>10 queues) something goes wrong with the TCP. I'm seeing invalid TCP sequence numbers with SYN (ACK=0) and RST (ACK=0) packets, dropped by the DUT. No issues seen with 8 or fewer cores.

hhaim commented 3 years ago

@priikone maybe this is an issue with AWS. With bare metal and software mode, it works fine with 20 cores Please share the TCP/UDP counters to verify. Do you need more than 8 cores in AWS? Are you not limited by the hardware?

priikone commented 3 years ago

Actually, it was an ARP problem and the errors were because of retransmissions.

hhaim commented 3 years ago

@priikone could you elaborate? I don't see how the number of cores related to ARP

priikone commented 3 years ago

The range of IPs defined in ASTFIPGenDist covers the number of CPU cores, and I didn't get ARP replies for IPs beyond the first 8 in the range. But now I do and traffic works fine with 16 cores.

cisco-system-traffic-generator / trex-core

AWS ENA DPDK only able to use one core with igb_uio #509