ansyun / dpdk-ans

ANS(Accelerated Network Stack) on DPDK, DPDK native TCP/IP stack.
https://ansyun.com
BSD 3-Clause "New" or "Revised" License
1.15k stars 322 forks source link

dpdk-ans is slower than regular Linux epoll with 100Gbit/s #16

Closed JelteF closed 7 years ago

JelteF commented 8 years ago

I've been converting iperf3 to use DPDK instead of the Linux networking stack. However when running it normally it can get ~40Gbit/s en when using the dpdk-ans version it can only achieve ~4Gbit/s.

The source code which uses regular epoll can be found here: https://github.com/JelteF/iperf/tree/epoll And the code for the ANS version here: https://github.com/JelteF/iperf/tree/ans

Do you have any suggestions on how to improve the performance?

JelteF commented 8 years ago

We could also discuss this on Slack if you could send me an invite. My email is in my profile.

bluenet13 commented 8 years ago
  1. TCP performance or UDP performance?
  2. please share your test steps.
  3. please share ANS startup logs.
JelteF commented 8 years ago

TCP or UDP?

It's about TCP performance, I haven't gotten UDP working yet.

Test steps

There is a client and a server machine (these can also be the same machine if two interfaces are connected to eachother). Most of the steps are the same for both machines if they are different I will explain.

  1. Install DPDK and ANS and start ANS and set the $RTE_ variables
  2. Install and compile the repo:
git clone https://github.com/JelteF/iperf
cd iperf/src
git checkout ans
make
  1. On the server machine start the server like this:
sudo build/iperf3 -s --bind <ip-of-the-interface>
  1. On the client machine start the client like this:
sudo build/iperf3 -c <ip-of-the-server>
  1. It will end in a crash, but that is not important, you will see the speed anyway.

Do the same test with regular epoll

  1. Turn off ANS
  2. Install and compile the repo
git clone https://github.com/JelteF/iperf iperf-epoll
cd iperf-epoll
git checkout epoll
./configure
make
  1. Run the client and the server again
src/iperf3 -s --bind <ip-of-the-interface>
src/iperf3 -c <ip-of-the-server>

ANS startup logs

Only port one (2.2.2.1) is actually connected. On the

root@ps1 /home/jelte/dpdk-ans ans/build/ans -c 0x3 -n3  -- -p=0x1 --config="(0,0,0),(1,0,1)"
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 0 on socket 0
EAL: Detected lcore 5 as core 1 on socket 0
EAL: Detected lcore 6 as core 2 on socket 0
EAL: Detected lcore 7 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 8 lcore(s)
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x7f000000 bytes
EAL: Virtual area found at 0x7f2d00400000 (size = 0x7f000000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f2cffe00000 (size = 0x400000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f2cff800000 (size = 0x400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cff400000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cff000000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cfec00000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f2cfe800000 (size = 0x200000)
EAL: Requesting 1024 pages of size 2MB from socket 0
EAL: TSC frequency is ~3700010 KHz
EAL: Master lcore 0 is ready (tid=7fa408c0;cpuset=[0])
EAL: lcore 1 is ready (tid=fd9f2700;cpuset=[1])
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1013 librte_pmd_mlx5
PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false, MPS: false)
PMD: librte_pmd_mlx5: 1 port(s) detected
PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:c0:88:aa
EAL: PCI device 0000:02:00.1 on NUMA socket 0
EAL:   probe driver: 15b3:1013 librte_pmd_mlx5
PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false, MPS: false)
PMD: librte_pmd_mlx5: 1 port(s) detected
PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:c0:88:ab
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:05:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
param nb 2 ports 2 
port id 0 
 port id 1 

Start to Init port 
     port 0:  
     port name librte_pmd_mlx5:  
     max_rx_queues 65535: max_tx_queues:65535 
     rx_offload_capa 14: tx_offload_capa:15 
     Creating queues: rx queue number=1 tx queue number=2... 
PMD: librte_pmd_mlx5: 0xdebdc0: TX queues number update: 0 -> 2
PMD: librte_pmd_mlx5: 0xdebdc0: RX queues number update: 0 -> 1
     MAC Address:E4:1D:2D:C0:88:AA 
     Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0 
     lcore id:0, tx queue id:0, socket id:0 
     Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff 
     Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0 
     lcore id:1, tx queue id:1, socket id:0 
     Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff 

     port 1:  
     port name librte_pmd_mlx5:  
     max_rx_queues 65535: max_tx_queues:65535 
     rx_offload_capa 14: tx_offload_capa:15 
     Creating queues: rx queue number=1 tx queue number=2... 
PMD: librte_pmd_mlx5: 0xdefe08: TX queues number update: 0 -> 2
PMD: librte_pmd_mlx5: 0xdefe08: RX queues number update: 0 -> 1
     MAC Address:E4:1D:2D:C0:88:AB 
     Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0 
     lcore id:0, tx queue id:0, socket id:0 
     Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff 
     Deault-- tx pthresh:0, tx hthresh:0, tx wthresh:0, txq_flags:0x0 
     lcore id:1, tx queue id:1, socket id:0 
     Conf-- tx pthresh:36, tx hthresh:0, tx wthresh:0, txq_flags:0xfffff1ff 

Allocated mbuf pool on socket 0, mbuf number: 16384 

Initializing rx queues on lcore 0 ... 
Default-- rx pthresh:0, rx hthresh:0, rx wthresh:0 
port id:0, rx queue id: 0, socket id:0 
Conf-- rx pthresh:8, rx hthresh:8, rx wthresh:4 

Initializing rx queues on lcore 1 ... 
Default-- rx pthresh:0, rx hthresh:0, rx wthresh:0 
port id:1, rx queue id: 0, socket id:0 
Conf-- rx pthresh:8, rx hthresh:8, rx wthresh:4 
core mask: 3, sockets number:1, lcore number:2 
start to init ans 
USER8: LCORE[0] lcore mask 0x3 
USER8: LCORE[0] lcore id 0 is enable 
USER8: LCORE[0] lcore id 1 is enable 
USER8: LCORE[0] lcore number 2 
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER8: LCORE[0] UDP layer init successfully, Use memory:4194304 bytes 
USER8: LCORE[0] TCP hash table init successfully, tcp pcb size 448 total size 29360128 
USER8: LCORE[0] TCP hash table init successfully, tcp pcb size 448 total size 29360128 
USER8: LCORE[0] so shm memory 16777216 bytes, so number 131072,  sock shm size 128 bytes 
USER8: LCORE[0] Sock init successfully, allocated of 41943040 bytes 
add eth0 device
USER8: LCORE[0] Interface eth0 if_capabilities: 0xf 
add IP 1020202 on device eth0 
add eth1 device
USER8: LCORE[0] Interface eth1 if_capabilities: 0xf 
add IP 1020203 on device eth1 
Show interface 

eth0    HWaddr e4:1d:2d:c0:88:aa
        inet addr:2.2.2.1
        inet addr:255.255.255.0

eth1    HWaddr e4:1d:2d:c0:88:ab
        inet addr:3.2.2.1
        inet addr:255.255.255.0
add static route 

Destination     Gateway     Netmask         Flags       Iface
2.2.2.0     *       255.255.255.0       U C         0
2.2.2.5     *       255.255.255.255     U H L       0
3.2.2.0     *       255.255.255.0       U C         1
3.3.3.0     2.2.2.5     255.255.255.0       U G         0

USER8: LCORE[-1] ANS mgmt thread startup 
Checking link status done
Port 0 Link Up - speed 100000 Mbps - full-duplex
Port 1 Link Up - speed 0 Mbps - full-duplex
USER8: main loop on lcore 1
USER8:  -- lcoreid=1 portid=1 rxqueueid=0
nb ports 2 hz: 3700010272 
USER8: main loop on lcore 0
USER8:  -- lcoreid=0 portid=0 rxqueueid=0
nb ports 2 hz: 3700010272 

Startup logs of iperf ans:

jelte@ps1 ~/iperf/src sudo build/iperf3 -s --bind 2.2.2.1
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 0 on socket 0
EAL: Detected lcore 5 as core 1 on socket 0
EAL: Detected lcore 6 as core 2 on socket 0
EAL: Detected lcore 7 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 8 lcore(s)
EAL: Setting up physically contiguous memory...
EAL: Analysing 1024 files
EAL: Mapped segment 0 of size 0x7f000000
EAL: Mapped segment 1 of size 0x400000
EAL: Mapped segment 2 of size 0x400000
EAL: Mapped segment 3 of size 0x200000
EAL: Mapped segment 4 of size 0x200000
EAL: Mapped segment 5 of size 0x200000
EAL: Mapped segment 6 of size 0x200000
EAL: memzone_reserve_aligned_thread_unsafe(): memzone <RG_MP_log_history> already exists
RING: Cannot reserve memory
EAL: TSC frequency is ~3700000 KHz
EAL: Master lcore 0 is ready (tid=f7fed8a0;cpuset=[0])
USER8: LCORE[-1] anssock any lcore id 0xffffffff 
USER8: LCORE[2] anssock app id: 5380 
USER8: LCORE[2] anssock app name: iperf3 
USER8: LCORE[2] anssock app bind lcoreId: 0 
USER8: LCORE[2] anssock app bind queueId: 0 
USER8: LCORE[2] anssock app lcoreId: 2 
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
bluenet13 commented 8 years ago
  1. Please do below action to improve dpdk performance. These tips have been updated to the readme. in order to improve ANS performance, you shall isolate ANS'lcore from kernel by isolcpus and isolcate interrupt from ANS's lcore by update /proc/irq/default_smp_affinity file. Don't run ANS on lcore0, it will effect ANS performance.
  2. Why the second NIC speed is 0. but maybe it wouldn't effect the iperf testing. Port 1 Link Up - speed 0 Mbps - full-duplex
  3. One tcp connection traffic are only handled in one lcore. but in linux kernel many lcore handle it. ANS still didn't support TSO, linux kernel support it, maybe these are the major reason.
  4. please share the iperf output here. thanks.
JelteF commented 8 years ago
  1. To not run ANS on lcore0 I should use an even number for -c option right? Do I need to set the opposite of my -c option in /proc/irq/default_smp_affinity? So for instance -c 0xe and then in the file f1?
  2. The second NIC does not have a cable connected to it, so it is not used by iperf.

3/4. I will try to disable TSO in linux and share my results.

bluenet13 commented 8 years ago

I sent a invite to you in slack.com.

JelteF commented 8 years ago

Thanks, I accepted it and sent a question there already.

JelteF commented 8 years ago

I have machines with 8 lcores. I have set isolcpus=0,1,2,3 as boot parameter. Also I changed /proc/irq/default_smp_affinity to f0. I now get a reliable throughput with both regular Linux and ANS throughput. I ran tests with all offloading disabled an with everything enabled. This was done in the following way:

# off
ethtool -K eth2 rxvlan off txvlan off gso off tso off rx off tx off sg off rxhash off gro off rx-vlan-filter off lro off
# on
ethtool -K eth2 rxvlan on txvlan on gso on tso on rx on tx on sg on rxhash on gro on rx-vlan-filter on lro on

ANS was started using this command:

ans/build/ans -c 0x2 -n1  -- -p=0x1 --config="(0,0,1))"

Tests with offloading off

With all types of offloading disabled I am able to get the following speeds for Linux:

jelte@ps2 ~/iperf-epoll src/iperf3 -c 2.2.2.3 -A 1,1 -t 10
Connecting to host 2.2.2.3, port 5201
[  5] local 2.2.2.4 port 38108 connected to 2.2.2.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.00   sec   929 MBytes  7.79 Gbits/sec    0    351 KBytes       
[  5]   1.00-2.00   sec   935 MBytes  7.84 Gbits/sec    0    434 KBytes       
[  5]   2.00-3.00   sec   931 MBytes  7.81 Gbits/sec    0    434 KBytes       
[  5]   3.00-4.00   sec   935 MBytes  7.84 Gbits/sec    0    434 KBytes       
[  5]   4.00-5.00   sec   932 MBytes  7.82 Gbits/sec    0    434 KBytes       
[  5]   5.00-6.00   sec   935 MBytes  7.84 Gbits/sec    0    434 KBytes       
[  5]   6.00-7.00   sec   931 MBytes  7.82 Gbits/sec    0    451 KBytes       
[  5]   7.00-8.00   sec   934 MBytes  7.83 Gbits/sec    0    465 KBytes       
[  5]   8.00-9.00   sec   922 MBytes  7.75 Gbits/sec    0    465 KBytes       

And these speeds for ANS:

jelte@ps2 ~/iperf/src sudo build/iperf3 -c 2.2.2.1  -A 2,2
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
...........
USER8: LCORE[0] anssock app lcoreId: 0 
Connecting to host 2.2.2.1, port 5201
[ 17] local :: port 2496 connected to :: port 20756
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[ 17]   0.00-1.00   sec   580 MBytes  4.87 Gbits/sec  4294960096   0.00 Bytes       
[ 17]   1.00-2.00   sec   581 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   2.00-3.00   sec   581 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   3.00-4.00   sec   580 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   4.00-5.00   sec   580 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[ 17]   5.00-6.00   sec   581 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   6.00-7.00   sec   580 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   7.00-8.00   sec   580 MBytes  4.87 Gbits/sec    0   0.00 Bytes       
[ 17]   8.00-9.00   sec   581 MBytes  4.87 Gbits/sec    0   0.00 Bytes       

Tests with offloading on

Linux:

jelte@ps2 ~/iperf-epoll src/iperf3 -c 2.2.2.3 -A 2,2 -t 10
Connecting to host 2.2.2.3, port 5201
[  5] local 2.2.2.4 port 38140 connected to 2.2.2.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.00   sec  3.21 GBytes  27.6 Gbits/sec   60    822 KBytes       
[  5]   1.00-2.00   sec  3.21 GBytes  27.5 Gbits/sec   13    932 KBytes       
[  5]   2.00-3.00   sec  3.21 GBytes  27.6 Gbits/sec   16   1.00 MBytes       
[  5]   3.00-4.00   sec  3.21 GBytes  27.6 Gbits/sec   13   1.09 MBytes       
[  5]   4.00-5.00   sec  3.21 GBytes  27.5 Gbits/sec   16   1.16 MBytes       
[  5]   5.00-6.00   sec  3.21 GBytes  27.5 Gbits/sec   21   1.25 MBytes       
[  5]   6.00-7.00   sec  3.21 GBytes  27.5 Gbits/sec   15   1.32 MBytes       
[  5]   7.00-8.00   sec  3.21 GBytes  27.5 Gbits/sec   30    748 KBytes       
[  5]   8.00-9.00   sec  3.21 GBytes  27.5 Gbits/sec   18    846 KBytes 

ANS:

jelte@ps2 ~/iperf/src sudo build/iperf3 -c 2.2.2.1  -A 2,2
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
.........
USER8: LCORE[0] anssock app lcoreId: 0 
Connecting to host 2.2.2.1, port 5201
[  8] local :: port 960 connected to :: port 20756
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  8]   0.00-1.00   sec   580 MBytes  4.87 Gbits/sec  4294960096   0.00 Bytes       
[  8]   1.00-2.00   sec   580 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   2.00-3.00   sec   579 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   3.00-4.00   sec   579 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   4.00-5.00   sec   580 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   5.00-6.00   sec   579 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   6.00-7.00   sec   579 MBytes  4.85 Gbits/sec    0   0.00 Bytes       
[  8]   7.00-8.00   sec   579 MBytes  4.86 Gbits/sec    0   0.00 Bytes       
[  8]   8.00-9.00   sec   579 MBytes  4.86 Gbits/sec    0   0.00 Bytes       

Results

As you can see in both tests ANS has about the same throughput and in both cases it is lower than for Linux. Although when offloading is enabled this difference is a lot larger.

JelteF commented 8 years ago

When jumbo frames are enabled (MTU=9000) the normal Linux kernel has even more performance and it gets around ~43Gbit/s:

jelte@ps2 ~/iperf-epoll src/iperf3 -c 2.2.2.3 -A 2,2 
Connecting to host 2.2.2.3, port 5201
[  5] local 2.2.2.4 port 38218 connected to 2.2.2.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.00   sec  5.10 GBytes  43.8 Gbits/sec    0   1.15 MBytes       
[  5]   1.00-2.00   sec  5.07 GBytes  43.5 Gbits/sec    0   1.21 MBytes       
[  5]   2.00-3.00   sec  5.06 GBytes  43.5 Gbits/sec    0   1.22 MBytes       
[  5]   3.00-4.00   sec  5.06 GBytes  43.5 Gbits/sec    7   1.12 MBytes       
[  5]   4.00-5.00   sec  5.06 GBytes  43.5 Gbits/sec    0   1.13 MBytes       
[  5]   5.00-6.00   sec  5.07 GBytes  43.6 Gbits/sec    0   1.14 MBytes       
[  5]   6.00-7.00   sec  5.08 GBytes  43.6 Gbits/sec    0   1.14 MBytes       
[  5]   7.00-8.00   sec  5.08 GBytes  43.6 Gbits/sec    0   1.15 MBytes       
[  5]   8.00-9.00   sec  5.08 GBytes  43.6 Gbits/sec    0   1.17 MBytes       
bluenet13 commented 8 years ago

Thanks for your detail testing. For both ANS and linux iperf testing, How many tcp connections are established ? Yes, we shall improve ANS tcp stack performance.

JelteF commented 8 years ago

It opens two connections. One for command communication and one for the speedtest. So only a single one is heavily used. On 17 Jun 2016 02:55, "bluestar" notifications@github.com wrote:

Thanks for your detail testing. For both ANS and linux iperf testing, How many tcp connections are established ? Yes, we shall improve ANS tcp stack performance.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opendp/dpdk-ans/issues/16#issuecomment-226654880, or mute the thread https://github.com/notifications/unsubscribe/ABG8JoOwQuD9zhVHguc35yGlSWfyHOHMks5qMfB3gaJpZM4I2mLU .

bluenet13 commented 8 years ago

if TCP Window Scale Option is enable in linux kernel? if enable, the windows is large, the speed will faster.

JelteF commented 8 years ago

Yes it was enabled. Does ANS not support that? Also does ANS support any offloading to the NIC? If so which of the following does it support (You can click the ones that it supports):

bluenet13 commented 8 years ago

ANS still don't support window scale, and any TSO now, will support them in future.

JelteF commented 8 years ago

When I let my ANS iperf open multiple concurrent connections to the server it becomes much quicker the speed maxes out at three:

Connecting to host 2.2.2.1, port 5201
[ 13] local :: port 1984 connected to :: port 20756
[ 14] local :: port 2240 connected to :: port 20756
[ 15] local :: port 2496 connected to :: port 20756
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[ 13]   0.00-1.00   sec   970 MBytes  8.13 Gbits/sec  4294960064   0.00 Bytes       
[ 14]   0.00-1.00   sec   970 MBytes  8.13 Gbits/sec  4294960064   0.00 Bytes       
[ 15]   0.00-1.00   sec   970 MBytes  8.13 Gbits/sec  4294960064   0.00 Bytes       
[SUM]   0.00-1.00   sec  2.84 GBytes  24.4 Gbits/sec  -21696             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   1.00-2.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 14]   1.00-2.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 15]   1.00-2.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[SUM]   1.00-2.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   2.00-3.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[ 14]   2.00-3.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[ 15]   2.00-3.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[SUM]   2.00-3.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   3.00-4.00   sec   971 MBytes  8.14 Gbits/sec    0   0.00 Bytes       
[ 14]   3.00-4.00   sec   971 MBytes  8.14 Gbits/sec    0   0.00 Bytes       
[ 15]   3.00-4.00   sec   971 MBytes  8.14 Gbits/sec    0   0.00 Bytes       
[SUM]   3.00-4.00   sec  2.84 GBytes  24.4 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   4.00-5.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[ 14]   4.00-5.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[ 15]   4.00-5.00   sec   972 MBytes  8.15 Gbits/sec    0   0.00 Bytes       
[SUM]   4.00-5.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   5.00-6.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 14]   5.00-6.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 15]   5.00-6.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[SUM]   5.00-6.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   6.00-7.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 14]   6.00-7.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 15]   6.00-7.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[SUM]   6.00-7.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   7.00-8.00   sec   972 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 14]   7.00-8.00   sec   972 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 15]   7.00-8.00   sec   972 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[SUM]   7.00-8.00   sec  2.85 GBytes  24.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 13]   8.00-9.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 14]   8.00-9.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[ 15]   8.00-9.00   sec   973 MBytes  8.16 Gbits/sec    0   0.00 Bytes       
[SUM]   8.00-9.00   sec  2.85 GBytes  24.5 Gbits/sec    0             

Somehow a single connection even becomes faster than before, from about 5Gbit to 8Gbit.

When I open 4 connections it will lower them to 6Gbit/per connection still totalling to 24.5. At more than 4 connections it will start to behave strangely and have lots of differences in the speed

Connecting to host 2.2.2.1, port 5201
[ 32] local :: port 5824 connected to :: port 20756
[ 33] local :: port 6080 connected to :: port 20756
[ 34] local :: port 6336 connected to :: port 20756
[ 35] local :: port 6592 connected to :: port 20756
[ 36] local :: port 6848 connected to :: port 20756
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[ 32]   0.00-1.00   sec   577 MBytes  4.84 Gbits/sec  4294960064   0.00 Bytes       
[ 33]   0.00-1.00   sec   577 MBytes  4.84 Gbits/sec  4294960064   0.00 Bytes       
[ 34]   0.00-1.00   sec   577 MBytes  4.84 Gbits/sec  4294960064   0.00 Bytes       
[ 35]   0.00-1.00   sec   577 MBytes  4.84 Gbits/sec  4294960064   0.00 Bytes       
[ 36]   0.00-1.00   sec   577 MBytes  4.84 Gbits/sec  4294960064   0.00 Bytes       
[SUM]   0.00-1.00   sec  2.82 GBytes  24.2 Gbits/sec  -36160             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   1.00-2.02   sec   472 MBytes  3.88 Gbits/sec    0   0.00 Bytes       
[ 33]   1.00-2.02   sec   472 MBytes  3.88 Gbits/sec    0   0.00 Bytes       
[ 34]   1.00-2.02   sec   472 MBytes  3.88 Gbits/sec    0   0.00 Bytes       
[ 35]   1.00-2.02   sec   472 MBytes  3.88 Gbits/sec    0   0.00 Bytes       
[ 36]   1.00-2.02   sec   472 MBytes  3.88 Gbits/sec    0   0.00 Bytes       
[SUM]   1.00-2.02   sec  2.30 GBytes  19.4 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   2.02-3.30   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[ 33]   2.02-3.30   sec  4.28 KBytes  27.4 Kbits/sec    0   0.00 Bytes       
[ 34]   2.02-3.30   sec  1.43 KBytes  9.12 Kbits/sec    0   0.00 Bytes       
[ 35]   2.02-3.30   sec  1.43 KBytes  9.12 Kbits/sec    0   0.00 Bytes       
[ 36]   2.02-3.30   sec  1.43 KBytes  9.12 Kbits/sec    0   0.00 Bytes       
[SUM]   2.02-3.30   sec  8.55 KBytes  54.7 Kbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   3.30-4.06   sec   161 MBytes  1.78 Gbits/sec    0   0.00 Bytes       
[ 33]   3.30-4.06   sec   396 MBytes  4.38 Gbits/sec    0   0.00 Bytes       
[ 34]   3.30-4.06   sec   396 MBytes  4.38 Gbits/sec    0   0.00 Bytes       
[ 35]   3.30-4.06   sec   396 MBytes  4.38 Gbits/sec    0   0.00 Bytes       
[ 36]   3.30-4.06   sec   152 MBytes  1.68 Gbits/sec    0   0.00 Bytes       
[SUM]   3.30-4.06   sec  1.47 GBytes  16.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   4.06-5.58   sec  1.43 KBytes  7.69 Kbits/sec    0   0.00 Bytes       
[ 33]   4.06-5.58   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[ 34]   4.06-5.58   sec  1.43 KBytes  7.69 Kbits/sec    0   0.00 Bytes       
[ 35]   4.06-5.58   sec  1.43 KBytes  7.69 Kbits/sec    0   0.00 Bytes       
[ 36]   4.06-5.58   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[SUM]   4.06-5.58   sec  4.28 KBytes  23.1 Kbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   5.58-6.00   sec   243 MBytes  4.83 Gbits/sec    0   0.00 Bytes       
[ 33]   5.58-6.00   sec   239 MBytes  4.76 Gbits/sec    0   0.00 Bytes       
[ 34]   5.58-6.00   sec   239 MBytes  4.76 Gbits/sec    0   0.00 Bytes       
[ 35]   5.58-6.00   sec   235 MBytes  4.67 Gbits/sec    0   0.00 Bytes       
[ 36]   5.58-6.00   sec   235 MBytes  4.67 Gbits/sec    0   0.00 Bytes       
[SUM]   5.58-6.00   sec  1.16 GBytes  23.7 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   6.00-7.88   sec   472 MBytes  2.10 Gbits/sec    0   0.00 Bytes       
[ 33]   6.00-7.88   sec   472 MBytes  2.10 Gbits/sec    0   0.00 Bytes       
[ 34]   6.00-7.88   sec   472 MBytes  2.10 Gbits/sec    0   0.00 Bytes       
[ 35]   6.00-7.88   sec   472 MBytes  2.10 Gbits/sec    0   0.00 Bytes       
[ 36]   6.00-7.88   sec   472 MBytes  2.10 Gbits/sec    0   0.00 Bytes       
[SUM]   6.00-7.88   sec  2.30 GBytes  10.5 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   7.88-8.00   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[ 33]   7.88-8.00   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[ 34]   7.88-8.00   sec   112 MBytes  7.84 Gbits/sec    0   0.00 Bytes       
[ 35]   7.88-8.00   sec   112 MBytes  7.84 Gbits/sec    0   0.00 Bytes       
[ 36]   7.88-8.00   sec  0.00 Bytes  0.00 bits/sec    0   0.00 Bytes       
[SUM]   7.88-8.00   sec   223 MBytes  15.7 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 32]   8.00-9.00   sec  24.8 MBytes   208 Mbits/sec    0   0.00 Bytes       
[ 33]   8.00-9.00   sec  18.4 MBytes   155 Mbits/sec    0   0.00 Bytes       
[ 34]   8.00-9.00   sec   143 MBytes  1.20 Gbits/sec    0   0.00 Bytes       
[ 35]   8.00-9.00   sec   143 MBytes  1.20 Gbits/sec    0   0.00 Bytes       
[ 36]   8.00-9.00   sec   503 MBytes  4.22 Gbits/sec    0   0.00 Bytes       
[SUM]   8.00-9.00   sec   832 MBytes  6.98 Gbits/sec    0             
iperf3: error - unable to receive results: Resource temporarily unavailable

Also, one question how do you enable jumboframes? Is it still the method described in this https://github.com/opendp/dpdk-ans/issues/9#issuecomment-204878450? Because that is causing some issues for me.

bluenet13 commented 8 years ago

it seems ANS with one lcore could only handle about 24G tcp data. Yes, you can enable jumboframe in that comment, but you shall set the interface MTU too in the ans_main.c. By the way, what is the mean of Retr Cwnd in your test output?

JelteF commented 8 years ago

In ans_main.c it should look like this right:

int mtu = 9000;
ans_intf_set_mtu((caddr_t)ifname, &mtu);

And for the output, the Cwnd column is congestion window size and Retr column is number of retransmits. The method of gathering this data does not seem to work for ANS though.

bluenet13 commented 8 years ago

Yes, you can set the mtu like that. by the way. cwnd is tcp congestion window size? According to your test result, linux cwnd size(434 KBytes) is very large. so linux is faster. You may disable TCP window scale and retry to test.

JelteF commented 8 years ago

When setting the MTU with this command:

ans/build/ans -c 0x2 -n8  -- -p=0x1 --config="(0,0,1))"  --enable-jumbo --max-pkt-len 9000

And with the code in my previous comment.

I get a Bus error like this:

nb ports 1 hz: 3700001415 
USER8: LCORE[1] link mbuf failed 

Program received signal SIGBUS, Bus error.
0x00000000004481f0 in rte_pktmbuf_free ()
Missing separate debuginfos, use: debuginfo-install libibverbs-1.1.8mlnx1-OFED.3.3.0.0.9.33100.x86_64 libmlx4-1.0.6mlnx1-OFED.3.3.0.0.7.33100.x86_64 libmlx5-1.0.2mlnx1-OFED.3.3.0.0.9.33100.x86_64
(gdb) bt
#0  0x00000000004481f0 in rte_pktmbuf_free ()
#1  0x000000000044a68e in ans_conn_clear ()
#2  0x000000000044a7ce in ans_close_tcp ()
#3  0x0000000000445e07 in ans_close ()
#4  0x0000000000447546 in ans_sock_clear ()
#5  0x000000000044bc70 in ans_so_coremsg_handle ()
#6  0x000000000044bf2c in ans_so_msg_handle_burst ()
#7  0x0000000000431a7e in ans_main_loop ()
#8  0x00000000004cbcd3 in rte_eal_mp_remote_launch ()
#9  0x00000000004316c5 in main ()
JelteF commented 8 years ago

With all the offloading off and MTU size of 1500 and TCP window scaling off I get this Linux performance:

jelte@ps1 ~/iperf-epoll src/iperf3 -c 2.2.2.4 -A 2,2 
Connecting to host 2.2.2.4, port 5201
[  5] local 2.2.2.3 port 35962 connected to 2.2.2.4 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.00   sec   713 MBytes  5.98 Gbits/sec    0   70.7 KBytes       
[  5]   1.00-2.00   sec   714 MBytes  5.99 Gbits/sec    0   70.7 KBytes       
[  5]   2.00-3.00   sec   713 MBytes  5.98 Gbits/sec    0   70.7 KBytes       
[  5]   3.00-4.00   sec   717 MBytes  6.02 Gbits/sec    0   70.7 KBytes       
[  5]   4.00-5.00   sec   714 MBytes  5.99 Gbits/sec    0   70.7 KBytes       
[  5]   5.00-6.00   sec   714 MBytes  5.99 Gbits/sec    0   70.7 KBytes       
[  5]   6.00-7.00   sec   722 MBytes  6.06 Gbits/sec    0   70.7 KBytes       
[  5]   7.00-8.00   sec   722 MBytes  6.06 Gbits/sec    0   70.7 KBytes       
[  5]   8.00-9.00   sec   719 MBytes  6.03 Gbits/sec    0   70.7 KBytes       

But when using multiple connections I still get around 8Gbit/s like before with a single one.

bluenet13 commented 8 years ago

ok, i will test the jumbo frame case.

it seems the tcp speed also depend on the cwnd.

JelteF commented 8 years ago

Ok thank you. Maybe it will be much faster then.

JelteF commented 8 years ago

Have you had time to test the jumbo frame case?

bluenet13 commented 8 years ago

No, haven't ENV to test it.

JelteF commented 8 years ago

Ok too bad. Do you have any idea why it could be that the per stream speed increases when multiple streams are used?

bluenet13 commented 8 years ago

You may change below macro in ans_main.h. check if increases the speed of one stream. If yes, maybe this is the root cause.

define MAX_TX_BURST 32 /* set tx burst as 1 for lower packet latency */

JelteF commented 8 years ago

Oh man, thanks for this. Changing the values there really changes the single stream performance a lot. I can now get more than 15Gbit/s over a single stream.

JelteF commented 8 years ago

However, I now do get lower performance for multiple streams. What do all the values there do?

JelteF commented 8 years ago

When setting MAX_TX_BURST to 1 I get the instability behaviour again:

[ 16] local :: port 20756 connected to :: port 1984
[ ID] Interval           Transfer     Bandwidth
[ 16]   0.00-1.00   sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]   1.00-2.00   sec  2.25 GBytes  19.4 Gbits/sec                  
[ 16]   2.00-3.00   sec   759 MBytes  6.37 Gbits/sec                  
[ 16]   3.00-4.00   sec  1.43 KBytes  11.7 Kbits/sec                  
[ 16]   4.00-5.00   sec  2.10 GBytes  18.1 Gbits/sec                  
[ 16]   5.00-6.00   sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]   6.00-7.00   sec   758 MBytes  6.36 Gbits/sec                  
[ 16]   7.00-8.00   sec   875 MBytes  7.34 Gbits/sec                  
[ 16]   8.00-9.00   sec   758 MBytes  6.36 Gbits/sec                  
[ 16]   9.00-10.00  sec   798 MBytes  6.69 Gbits/sec                  
[ 16]  10.00-11.00  sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]  11.00-12.00  sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]  12.00-13.00  sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]  13.00-14.00  sec  2.25 GBytes  19.4 Gbits/sec                  
[ 16]  14.00-15.00  sec   758 MBytes  6.36 Gbits/sec                  
[ 16]  15.00-16.00  sec   562 MBytes  4.71 Gbits/sec                  
[ 16]  16.00-17.00  sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]  17.00-18.00  sec  2.25 GBytes  19.3 Gbits/sec                  
[ 16]  18.00-19.00  sec   759 MBytes  6.36 Gbits/sec                  
[ 16]  19.00-20.00  sec   407 MBytes  3.41 Gbits/sec                  
[ 16]  20.00-20.00  sec   176 KBytes  19.3 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[ 16]   0.00-20.00  sec  0.00 Bytes  0.00 bits/sec                  sender
[ 16]   0.00-20.00  sec  28.6 GBytes  12.3 Gbits/sec                  receiver
bluenet13 commented 8 years ago

MAX_TX_BURST means how many packets buffer before send them to NIC. so if your NIC is high speed NIC, it is better to set it to 1, and the packet latency is lower, tcp speed stream is high.

bluenet13 commented 8 years ago

Shall tcpdump the packets to analysis the instability behaviour, maybe the packets are lost.

bluenet13 commented 8 years ago

If ANS print any logs? For example "link mbuf failed " log

JelteF commented 8 years ago

ANS does not print any logs. How do I tcpdump ANS?

bluenet13 commented 8 years ago

ANS's error or info logs can be saw on the screens and syslog. you can tcpdump the packets on your iperf client side(linux PC).

bluenet13 commented 7 years ago

For instability behaviour, it is related to tcp timestamp feature, now has fixed it, it also fixed EPOLLOUT issue which has been reported by you before. Please try it again.