containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.12k stars 773 forks source link

bandwidth: introduce configurable queue length #1025

Closed middaywords closed 2 months ago

middaywords commented 3 months ago

Currently, the bandwidth plugin uses tc tbf and set default latency to 25ms, which easily causes Head-Of-Line blocking and bufferbloat. This commit proposes to make it a configurable value.

s1061123 commented 3 months ago

@middaywords thank you for the PR. But I suppose that it could be implemented by another CNI plugin as different purpose. Currently bandwidth has lots of features and code and your feature is different (i.e. bandwidth is slightly different from latency, even though tc is used). bandwidth is not 'perfect swiss army knife of TC CNI plugin', just for 'bandwidth'. Having another plugin makes code simple and feature centric (and have benefit as 'chained CNI' mechanism!).

middaywords commented 3 months ago

@s1061123 the latency in title may be misleading, it means queue depth in tc tbf, i've changed it. The key problem here is that, the bandwidth plugin uses a deep queue by default, and it's not configurable, which causes head-of-line blocking. Specifically, when using the bandwidth plugin to limit the bandwidth, it brings extra latency about tens of milliseconds, which means it cannot go into real production env.

middaywords commented 3 months ago
here is the test results when I test the plugin bandwidth using netperf, in local kind env. ping latency of server and client is about 50us, but the measured latency with traffic is about >10ms. So I'd suggest to make the queue length configurable to reduce the latency. Data flow direction Throughput(Mbps) for TCP_STREAM flow Latency(us) for TCP_RR flow
netperf-client -> netperf-server (egress rate limit) 95.6 10760.05
netperf-server -> netperf-client (ingress rate limit) 95.6 226120.05
s1061123 commented 3 months ago

Understand your comment. BTW, we'll merge this PR soon and you must need to rebase (it is pretty big).

https://github.com/containernetworking/plugins/pull/921

So please check the repo tomorrow and please rebase once repo is updated with above PR. Appreciated your help.

middaywords commented 2 months ago

https://github.com/containernetworking/plugins/pull/921 uses HTB class for QoS and rate limiting, it can avoid close Head-Of-LIne blocking using many different subclasses, and its queue length for traffic shaping is not configurable. Close this PR because it's no longer needed in HTB case