esnet / iperf

iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool
Other
6.74k stars 1.27k forks source link

bandwidth limit overshoot after micro outages #1747

Open jgc234 opened 3 weeks ago

jgc234 commented 3 weeks ago

Context

Bug Report

The --bitrate option is misleading, or not documented well, depends how you want to classify it. Instead of being maximum target bitrate, it is a long-term averaging target - it can overdrive without a limit until the long-term average has settled. The comment on optional burst rate ("can temporarily exceed the specified bandwidth limit") implies the non-burst version does not temporarily exceed the intended bandwidth.

Actual Behavior

If you have small outages on a network (eg 10 seconds), the bitrate throttle will attempt to catch up on the lost traffic by behaving if no throttle limit exists, driving the traffic as fast as it possibly can until the average bitrate since the start of the matches the long-term target bitrate. This seems to make sense looking at iperf_check_throttle, which calculates the average since the start time.

This doesn't look too exciting on a LAN or high-speed network (maybe a second or so at maximum), but on a slower WAN it may saturate the link for many minutes trying make up for the lost data.

On a LAN, the overshoot looks like a quantisation error - just filling up the congestion window for a short blip.

Unfortunately I only have an example on a LAN for the moment. I can generate a WAN looking example if required.

❯ iperf3-darwin -c beep --bitrate 20M --time 500
Connecting to host beep, port 5201
[  7] local 2403:5801:xxx:x:xxxx:xxxx:xxxx:xxxx port 63316 connected to 2403:5801:xxx:x::x port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  7]   0.00-1.00   sec  2.50 MBytes  21.0 Mbits/sec    2   1.80 MBytes   4ms     
[  7]   1.00-2.00   sec  2.38 MBytes  19.9 Mbits/sec    3   3.94 MBytes   6ms     
[  7]   2.00-3.00   sec  2.38 MBytes  19.9 Mbits/sec    2   5.90 MBytes   10ms     
[  7]   3.00-4.00   sec  2.38 MBytes  19.9 Mbits/sec    2   7.90 MBytes   8ms     
[  7]   4.00-5.00   sec  2.38 MBytes  19.9 Mbits/sec    1   8.00 MBytes   17ms     
[  7]   5.00-6.00   sec  2.38 MBytes  19.9 Mbits/sec    1   8.00 MBytes   8ms     
[  7]   6.00-7.00   sec   776 KBytes  6.36 Mbits/sec    4   1.16 KBytes   10ms     <--network outage start
[  7]   7.00-8.00   sec  0.00 Byte s  0.00 bits/sec    1   1.16 KBytes   10ms    
[  7]   8.00-9.00   sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]   9.00-10.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]  10.00-11.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   10ms     
[  7]  11.00-12.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]  12.00-13.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   10ms     
[  7]  13.00-14.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]  14.00-15.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   10ms     
[  7]  15.00-16.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]  16.00-17.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   10ms     
[  7]  17.00-18.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   10ms     
[  7]  18.00-19.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   10ms 
[  7]  19.00-20.00  sec  8.37 MBytes  70.2 Mbits/sec   26   1.04 MBytes   20ms     <-- network recovered
[  7]  20.00-21.00  sec  26.7 MBytes   224 Mbits/sec    0   1.08 MBytes   4ms      <-- unbounded overshoot
[  7]  21.00-22.00  sec  2.38 MBytes  19.9 Mbits/sec    0   1.08 MBytes   7ms     <-- settle back to average
[  7]  22.00-23.00  sec  2.38 MBytes  19.9 Mbits/sec    1   1.08 MBytes   8ms     
[  7]  23.00-24.00  sec  2.38 MBytes  19.9 Mbits/sec    3   1.08 MBytes   10ms     
[  7]  24.00-25.00  sec  2.38 MBytes  19.9 Mbits/sec    2   1.09 MBytes   20ms     
[  7]  25.00-26.00  sec  2.38 MBytes  19.9 Mbits/sec    2   1.13 MBytes   9ms     
[  7]  26.00-27.00  sec  2.38 MBytes  19.9 Mbits/sec    3   1.20 MBytes   10ms     
[  7]  27.00-28.00  sec  2.38 MBytes  19.9 Mbits/sec    2   1.28 MBytes   9ms     
[  7]  28.00-29.00  sec  2.38 MBytes  19.9 Mbits/sec    1   1.38 MBytes   7ms     
^C[  7]  29.00-29.28  sec   640 KBytes  18.9 Mbits/sec    0   1.41 MBytes   12ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  7]   0.00-29.28  sec  69.8 MBytes  20.0 Mbits/sec   62             sender   <-- correct long-term average.
[  7]   0.00-29.28  sec  0.00 Byte s  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

Steps to Reproduce

Possible Solution

jgc234 commented 3 weeks ago

Here's a more degenerative example. A 5Mb/s iperf throttle over a network capable of 20Mb/s with a 16 seconds outage in the middle of the test, which causes iperf to saturate the network for another 10 seconds afterwards.


iperf3-darwin -c 2403:5801:xxx:x::x --bitrate 5M --time 500
Connecting to host 2403:5801:xxx:x::x, port 5201
[  5] local 2403:5801:xxx:xx:xxxx:xxxx:xxxx:xxxx port 63534 connected to 2403:5801:xxx:x::x port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  5]   0.00-1.00   sec   646 KBytes  5.29 Mbits/sec  121   8.37 KBytes   3ms     
[  5]   1.00-2.00   sec   640 KBytes  5.24 Mbits/sec   50   12.6 KBytes   2ms     
[  5]   2.00-3.00   sec   640 KBytes  5.24 Mbits/sec   59   11.2 KBytes   4ms     
[  5]   3.00-4.00   sec   638 KBytes  5.22 Mbits/sec   41   13.9 KBytes   5ms     
[  5]   4.00-5.00   sec   507 KBytes  4.15 Mbits/sec   49   12.6 KBytes   3ms     
[  5]   5.00-6.00   sec   256 KBytes  2.10 Mbits/sec   13   1.16 KBytes   3ms     <-- start of outage
[  5]   6.00-7.00   sec  0.00 Byte s  0.00 bits/sec    2   1.16 KBytes   3ms     
[  5]   7.00-8.00   sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]   8.00-9.00   sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]   9.00-10.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  10.00-11.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]  11.00-12.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  12.00-13.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]  13.00-14.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  14.00-15.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]  15.00-16.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  16.00-17.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]  17.00-18.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  18.00-19.00  sec  0.00 Byte s  0.00 bits/sec    1   1.39 KBytes   3ms     
[  5]  19.00-20.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  20.00-21.00  sec  0.00 Byte s  0.00 bits/sec    0   1.39 KBytes   3ms     
[  5]  21.00-22.00  sec   696 KBytes  5.70 Mbits/sec   53   11.2 KBytes   5ms     <-- network recovered 
[  5]  22.00-23.00  sec  2.09 MBytes  17.5 Mbits/sec  157   11.2 KBytes   4ms     <-- overshoot to saturation of network
[  5]  23.00-24.00  sec  2.11 MBytes  17.7 Mbits/sec  173   9.76 KBytes   7ms     
[  5]  24.00-25.00  sec  2.12 MBytes  17.8 Mbits/sec  178   11.2 KBytes   3ms     
[  5]  25.00-26.00  sec  1.61 MBytes  13.5 Mbits/sec  119   9.76 KBytes   51ms     
[  5]  26.00-27.00  sec  1.93 MBytes  16.2 Mbits/sec  148   8.37 KBytes   5ms     
[  5]  27.00-28.00  sec  1.65 MBytes  13.8 Mbits/sec  123   13.9 KBytes   3ms     
[  5]  28.00-29.00  sec  1.79 MBytes  15.0 Mbits/sec  141   9.76 KBytes   6ms     
[  5]  29.00-30.00  sec   749 KBytes  6.14 Mbits/sec   65   13.9 KBytes   3ms     <-- recovered the "average", fallback to throttle
[  5]  30.00-31.00  sec   640 KBytes  5.24 Mbits/sec   39   6.97 KBytes   4ms     
[  5]  31.00-32.00  sec   512 KBytes  4.19 Mbits/sec   45   2.79 KBytes   3ms     
[  5]  32.00-33.00  sec   640 KBytes  5.24 Mbits/sec   40   9.76 KBytes   4ms     
[  5]  33.00-34.00  sec   640 KBytes  5.24 Mbits/sec   39   13.9 KBytes   6ms     
[  5]  34.00-35.00  sec   640 KBytes  5.24 Mbits/sec   49   8.37 KBytes   4ms     
[  5]  35.00-36.00  sec   638 KBytes  5.22 Mbits/sec   39   12.6 KBytes   5ms     
[  5]  36.00-37.00  sec   512 KBytes  4.19 Mbits/sec   54   12.6 KBytes   3ms     
[  5]  37.00-38.00  sec   640 KBytes  5.24 Mbits/sec   39   2.79 KBytes   4ms     
^C[  5]  38.00-38.72  sec   512 KBytes  5.81 Mbits/sec   43   11.2 KBytes   4ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-38.72  sec  23.2 MBytes  5.03 Mbits/sec  1886             sender
[  5]   0.00-38.72  sec  0.00 Byte s  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
davidBar-On commented 3 weeks ago

Or use a better algorithm that does some form of closed-loop adaptive rate limiting.

Seem to be a real issue when Cellular/RF networks are used (e.g. a car going into a tunnel for several seconds). I tried to think about what may be such algorithm and came up with the following options:

The question is which of the above (or other) options is better for the use case?

jgc234 commented 2 weeks ago

The question is which of the above (or other) options is better for the use case?

I think I'm overthinking this, but also consider the following:

I had a quick look at common algorithms - most of these are designed for the the two control systems in the other order (traffic generator first, flowing into some type of network throttle afterwards, managing a queue and controlling the output of the queue - eg token bucket, leaky bucket). The algorithm for the exit point of the queue is the part we're interested in, which seem to boil down to some controlled release over a time quanta, which is the options you've got above anyway. My complex and slow way of getting there.

Another thought - what about quantisation? Do we slam the link at 100% until our 1-second amount has been completed, then stop dead until the next second, or something more fine-grained and smoother pacing, or doesn't it matter?

A funny observation while reading up on shaping algorithms - there's a handy tool available called iperf to generate traffic to test your algorithm :)

davidBar-On commented 2 weeks ago

Another thought - what about quantisation? Do we slam the link at 100% until our 1-second amount has been completed, then stop dead until the next second, or something more fine-grained and smoother pacing, or doesn't it matter?

"quantisation" is already how uiperf3 works, using the --pacing-timer value (default is 1000 micro-sec).

I had a quick look at common algorithms .... funny observation while reading up on shaping algorithms ...

It is a good idea to look at these algorithms. I didn't do it before. Reading about the shaping algorithms, they seem to be too complex for iperf3. In addition, your funny observation (which I agree is funny) leads me to believe that actually there is no need for such complex algorithms in iperf3, as it is a tool used for loading the network for such algorithms.

What I think may be good enough and easy enough to be implemented in iperf3 is:

  1. Have an option to limit the maximum bitrate sent. Actually, this option is already supported under Linux for TCP, using --fq-rate, so it will have to be implemented manually for the other cases - basically take it is the maximum rate if the temporary rate required is over the -b value.
  2. Have an option to limit the average bitrate calculation to the last n report intervals. E.g. "1" will means calculation is done for each interval independent on the previous intervals, "2" will take into account only this and the previous intervals, etc.
jgc234 commented 2 weeks ago

Option 2 sounds OK. Pretty simple, and it's similar to what's already there, but with limited view into history rather than back to the start. This would mean you could still get small bursts, but at least it's limited to a fraction of a second.

Option 2 with a smaller time quanta becomes an implementation of Option 1. If increments of the pacing timer was used instead of the reporting timer, then the user has full control, but also more confusing for a user to think about and calculate.

Where the average is calculated over multiple intervals, it becomes a moving average which will have a smoothing effect.

A broader question - Is the current -b behaviour an expectation for users that is actively exploited as a feature, or is it considered a bug, or are historical behaviours left as they are to minimise change?

jgc234 commented 2 weeks ago

A more disruptive thought - the burst function looks like a token bucket algorithm already, but in the code it looks separately implemented to the throttle (at a quick glance). In theory they could be unified into the one simple algorithm that does both - allocate tokens at a constant rate based on the target throughput rate, collect up to "burst" number of tokens in reserve. Less code, less logic, unified concept, same interface. Disclaimer - I'm talking more than reading code..

bmah888 commented 2 weeks ago

This situation is somewhat unusual in the environments for which iperf3 was originally intended (high-speed R&E networks, which tend to have both high bandwidth and high reliability). It's definitely counterintutive. Basically, iperf3 doesn't really know when the network is unavailable or when it's in "catch up" mode with respect to its software pacing.

If you really want to cap the sending rate of the connection so that the sender never, under any circumstances, exceeds some bitrate you specify, then (at least under Linux) you can try using the --fq-rate parameter. This enables policing within the kernel for the connection, and it's applied (as far as I know) to all data sent on the connection, whether it's original data or retransmitted data. This essentially puts a bottleneck on the path.

EDIT: I'm going to advise against trying to add better pacing mechanisms within iperf3. Really the main use case iperf3 is to check end-to-end network and application performance on high-speed R&E networks. In this type of scenario the application-level pacing isn't very useful, and even small code changes can affect the ability of iperf3 to test on high-bandwidth paths (100+ Gbps).