esnet / iperf

iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool
Other
6.99k stars 1.28k forks source link

iperf3 Bandwidth fluctuation with mutiple clients (25G NIC) #1047

Open Jeff0083 opened 4 years ago

Jeff0083 commented 4 years ago

Context

The server creates two iperf3 servers. When bound to the same cpu, the bandwidth is stable, but when bound to different cpus, bandwidth fluctuations will occur

case 2 same cpu:

server:(node-3)

nohup iperf3 -A 14/14 -p 5201 & 
nohup iperf3 -A 14/14 -p 5202 &

client(node-1,node-2)

node-1:iperf3 -c 192.168.236.4 -i 1 -t 10000 -b 10G -P 1 -p 5201 
node-2:iperf3 -c 192.168.236.4 -i 1 -t 10000 -b 10G -P 1 -p 5202

Test Results

Jeff0083 commented 4 years ago

Does anyone can help me?

davidBar-On commented 4 years ago

@zhao305149619, unless I am miss-reading your request, it is not clear when the issue happens. You wrote that "when bound to different cpus, bandwidth fluctuations will occur". However, the screenshots you provided show un-stability in "case 1 in same cpu". Which of there is true? Also, you didn't specify the bandwidth of between the client and server nodes.

If the issue is when both servers run on the same CPU then what you see may be reasonable, traffic from both clients may compete on link queue buffers on the server side.

Jeff0083 commented 4 years ago

@zhao305149619, unless I am miss-reading your request, it is not clear when the issue happens. You wrote that "when bound to different cpus, bandwidth fluctuations will occur". However, the screenshots you provided show un-stability in "case 1 in same cpu". Which of there is true? Also, you didn't specify the bandwidth of between the client and server nodes.

If the issue is when both servers run on the same CPU then what you see may be reasonable, traffic from both clients may compete on link queue buffers on the server side.

sorry,it's my mistake, Has been corrected.

davidBar-On commented 4 years ago

@zhao305149619, what are the intervals bandwidth that the servers report for the two cases? Is there also difference in the servers`reported bandwidth for the two cases? In case 1, is there a correlation between the bandwidth reported by the clients to the bandwidth reported by the servers?

In case 1 it seems that the average bandwidth is o.k. Therefore the issue may be related to TCP buffering mechanisms or wrong calculation of each client's interval bandwidth (note that you are using a more than 3 years old version of iperf3). Whether the servers` reports in the two cases are different may help to better understand what the issue is.

Jeff0083 commented 4 years ago

@davidBar-On what are the intervals bandwidth that the servers report for the two cases? ------> Same as client bandwidth report Is there also difference in the servers`reported bandwidth for the two cases? -----> yes, In case 1,Server reports show large bandwidth fluctuations In case 1, is there a correlation between the bandwidth reported by the clients to the bandwidth reported by the servers? -----> yes. iperf uses the version released by redhat, based on the lower version but it will always be backported.

Jeff0083 commented 4 years ago

@davidBar-On ,In case 2,Use systemtap to find out that the server will receive out of order packets

[10]tcp recvmsg use time(us): 76, rmem: 3703680, wnd: 1368448, ofo: 0, return 131072
[cpu14]timestamp: 1598956478061092, receive in seq packet: 332232630
ack for nxt seq: 332268422
ofo packet raise, cur seq: 332304214, exp seq :332268422 end seq:332340006
[10]tcp recvmsg use time(us): 67, rmem: 3767040, wnd: 1300992, ofo: 1, return 131072
ofo packet raise, cur seq: 332340006, exp seq :332268422 end seq:332366850
ofo packet raise, cur seq: 332366850, exp seq :332268422 end seq:332429486
[10]tcp recvmsg use time(us): 83, rmem: 3703680, wnd: 1341824, ofo: 1, return 131072
ofo packet raise, cur seq: 332429486, exp seq :332268422 end seq:332483174
ofo packet raise, cur seq: 332483174, exp seq :332268422 end seq:332545810
ofo packet raise, cur seq: 332545810, exp seq :332268422 end seq:332563706
[10]tcp recvmsg use time(us): 68, rmem: 3607680, wnd: 1341824, ofo: 1, return 131072
ofo packet raise, cur seq: 332563706, exp seq :332268422 end seq:332617394
[cpu14]timestamp: 1598956478061337, receive in seq packet: 332268422
[cpu14]timestamp: 1598956478061345, receive in seq packet: 332617394
ack for nxt seq: 332626342
[cpu14]timestamp: 1598956478061347, receive in seq packet: 332626342
ack for nxt seq: 332662134

Will it cause TCP fast retransmission or even timeout retransmission?

davidBar-On commented 4 years ago

@zhao305149619, although I am not sure I will have any conclusion based on the data you sent, you refer twice to case 2, while the instability is in case 1: "In case 2,Server reports show large bandwidth fluctuations" and "In case 2,Use systemtap to find out that the server will receive out of order packets". Are you referring in both cases to case 1 (2 client CPUs) were the client bandwidth is unstable?

davidBar-On commented 4 years ago

Will it cause TCP fast retransmission or even timeout retransmission?

The clients logs clearly show that there are several re-transmissions in case 1 (somehow I overlooked that ...). The systemtap log shows that only one packet was missing - 332268422, and that it was received at about the same time as 332617394 (348KB difference). It may be that 332268422 was received because of re-transmission. However if TCP SACK (Selective Ack) is used, only that packet would have been sent which should not cause major bandwidth issues.

Can you run Wireshark (or tcpdump) on one of the nodes so it will be possible to evaluate when the re-transmissions happen, what is re-transmitted, etc.?

In any case, probably the issue is because of the system/network configuration and not because of iperf3. If you can get more information on how the network routing/buffering/etc. is done and synchronized between different CPUs of the same node that may help.

davidBar-On commented 4 years ago

@zhao305149619 two more points to check:

  1. To check whether re-transmissions are causing the problem, try using UDP (-u option) instead of TCP. As UDP does not re-transmit, if there will still be instability in case1 (servers on different CPUs) then re-transmissions are not the issue.

  2. Can you assign different IP address to each server? As the same IP address is currently used for both servers, the same IP/TCP/UDP buffer is used for both servers (only sockets are different). Therefore, it may be that the issue is related to routing from the same IP buffer to two different CPUs, access synchronization of the same IP buffer from two CPUs, etc.

In any case, if you find what is causing the issue, can you share it? That may help other users.

Jeff0083 commented 4 years ago

@davidBar-On image server recieve many ofo packets and echo dup ack case many fast retransmission

davidBar-On commented 4 years ago

@zhao305149619 the Wireshark log you sent doesn't show a major re-transmission issue. The duplicate ack records 4309, 4333, 4345, 4375 and 4383 are not really duplicates. They each report about additional data received, but they all include a SACK entry that indicate a missing segment of 9014 bytes. Therefore they all report the same sequence (the info about he additional data received is in the SACK entry) and therefore are tagged as duplicate by Wireshark. (Although this is not shown in the image you included, it is concluded from the re-transmission and the size of these acks which is 78 bytes because of the additional SACK entry, vs. the normal 66 bytes.) This is the reason for the 9014 bytes re-transmission.

Again, the re-transmission doesn't seem to be a major issue. One more this you can try (in addition to the UDP and different server IP addresses) is to reduce the TCP segments sent below the 62,702 bytes segment size, e.g. by setting "-l 48K". If there is an issue related to segmentation that might help.