Currently the bandwidth manager is enforcing a rate limit for flows in one pod, and the flows in one pod are sharing one same queue. It's using tail drop policy with threshold as 2 seconds. This can cause bufferbloat and 2-second queuing latency when there are many tcp connections.
Here we introduce ecn marking to solve the issue, by default, the marking threshold is set to 1ms.
For tests, we had a pod with 100Mbps egress limit, and there are 128 TCP connections in the pod as background traffic, and we compare the TCP_RR latency
[x] All code is covered by unit and/or runtime tests where feasible.
[x] All commits contain a well written commit description including a title,
description and a Fixes: #XXX line if the commit addresses a particular
GitHub issue.
[ ] If your commit description contains a Fixes: <commit-id> tag, then
please add the commit author[s] as reviewer[s] to this issue.
Currently the bandwidth manager is enforcing a rate limit for flows in one pod, and the flows in one pod are sharing one same queue. It's using tail drop policy with threshold as 2 seconds. This can cause bufferbloat and 2-second queuing latency when there are many tcp connections.
Here we introduce ecn marking to solve the issue, by default, the marking threshold is set to 1ms.
For tests, we had a pod with 100Mbps egress limit, and there are 128 TCP connections in the pod as background traffic, and we compare the TCP_RR latency
Method | Avg Latency with-ECN | 3.1ms without-ECN | 2247.3ms
Please ensure your pull request adheres to the following guidelines:
Fixes: #XXX
line if the commit addresses a particular GitHub issue.Fixes: <commit-id>
tag, then please add the commit author[s] as reviewer[s] to this issue.Fixes: #29083