allow configuring numTxQueues and numRxQueues for macvlan and ipvlan

cyclinder commented 8 months ago

Adding a new configuration option: 'queues,' which is used to configure the "numtxqueues" and "numrxqueues" properties of the network card. It can only be a non-negative integer or 0. The default is 0, indicating that netlink API will configure the default numqueues as 1. In scenarios where network bandwidth sensitivity is crucial, allowing the configuration of numqueues will enhance network bandwidth.

Fixes https://github.com/containernetworking/plugins/issues/985

cyclinder commented 8 months ago

thanks @s1061123

If configuring tx/rx separately, it may increase complexity. In most scenarios(refer to https://github.com/projectcalico/cni-plugin/pull/1116), the queue lengths for tx/rx are the same(please correct me if I'm wrong), but I consider it okay whether configured separately or not.

Could you please add unit test to verify that tx/rx queue parameters are correctly configured to the created macvlan/ipvlan?

yes, I'm willing to do this.

s1061123 commented 8 months ago

If configuring tx/rx separately, it may increase complexity. In most scenarios(refer to https://github.com/projectcalico/cni-plugin/pull/1116), the queue lengths for tx/rx are the same(please correct me if I'm wrong), but I consider it okay whether configured separately or not.

Please show why you conclude 'most scenarios the queue lengths for tx/rx are the same ' from above PR, especially from the description, 'vpp has more than one workers to process packet in our situation'. You may need to know that vpp is not linux kernel network stack. In addition, linux kernel and netlink provide both as separate parameter. This is your PR, so please explain why you decide tx/rx queue are the same in almost case to consistent to linux kernel desgin, otherwise your change is not reasonable.

In addition, you may need to show who much performance is improved by multi rx/tx queue in ipvlan/macvlan because ipvlan/macvlan is software forwarding, hence the performance improvement is not the same as vpp/hardware NIC.

In #985, you mention 'it may enhance network performance.'. Could you please show 'tx/rx queue enahnce improves network performance in macvlan/ipvlan'? I'm not clear why it improves because macvlan/ipvlan is linux kernel network stack feature and it may not have multiple worker. So you need to explain why enhance macvlan/ipvlan tx/rx queue improves network performance or need to show 'how much performance is improved by tx/rx queue enhancement.

s1061123 commented 8 months ago

If configuring tx/rx separately, it may increase complexity. In most scenarios(refer to https://github.com/projectcalico/cni-plugin/pull/1116), the queue lengths for tx/rx are the same

We are talking about num of queue, not queue length.

cyclinder commented 8 months ago

We are talking about num of queue, not queue length.

Sorry for my bad words.

Please show why you conclude 'most scenarios the queue lengths for tx/rx are the same ' from above PR.

From the user's perspective, they may not be aware of how the kernel is designed. Typically, when using it, queues of tx/rx are configured together. Setting only one of them may be confusing🤔.

So you need to explain why enhance macvlan/ipvlan tx/rx queue improves network performance or need to show 'how much performance is improved by tx/rx queue enhancement.

Alright, I will make some performance benchmarks based on the current changes to see if it improves performance.

s1061123 commented 8 months ago

Based on above conversation, I suspect you just files PR without any clear reason. Like "I found interesting parameter (numTxQueues and numRxQueues). I don't why, but these parameters look like increasing more performance, so let's file a PR. Of coruse, I don't know recommendation parameters too because I don't know how linux kernel use these parameters." Simply to say, it is not a good PR because PR submitter can not explain what your code changes. PR submitter should explain how your code changes and what is happen actually because you actually write a code. Reviewer does NOT have responsible to your code. Reviewer is just responsible to review your code and reviewer does not explain what your code does. YOU, are the only one who explain because YOU write a code you should have a reason why you change and you must understand what you changes. Please be responsible to your code and please explain what the change makes.

From the user's perspective, they may not be aware of how the kernel is designed. Typically, when using it, queues of tx/rx are configured together. Setting only one of them may be confusing🤔.

This comment seems to be improvement feedback to linux kernel. Because numTxQueue and numRxQueue is defined linux kernel and its netlink API and you conclude that both should be consolidated to one parameter (numQueues?). This shoud be done in linux kernel and netlink API, not CNI because these are defined in linux kernel. You believe that almost user want to configure numTxQueue/numRxQeueue as same number without any reason (you suppose they don't have a reason why, but user want to change these parameters, right?). So don't ask CNI, and please ask Linux kernel community (https://lkml.org/). numTxQueue/numRxQueue are defined in linux kernel.

At least, I believe that if someone want to change some of linux kernel parameter, they understand what the parameter is (and they also understand it should be increased or decreased).

cyclinder commented 7 months ago

Simply to say, it is not a good PR because PR submitter can not explain what your code changes. PR submitter should explain how your code changes and what is happen actually because you actually write a code. Reviewer does NOT have responsible to your code. Reviewer is just responsible to review your code and reviewer does not explain what your code does

Thank you for your correction, I don't have a clear understanding of this parameter, and I haven't conducted performance tests to verify whether it improves network performance. I apologize for not handling this well.

I reviewed the design of the Netlink API, and I think what you said is correct. numrxqueues and numtxqueues should be configured separately, and they should not be set together. In some scenarios, numrxqueues and numtxqueues should be configured to be the same.

Additionally, I recompiled macvlan based on this PR and then set numRxQueues and numTxQueues. The numQueues of Pod's eth0 was modified to 5(default is 1). However, after testing their performance using the iperf3 tool, I found no significant improvement. I now doubt whether this setting makes sense.

root@macvlan1-numqueues-d967ddf56-mlj5x:/# ip --details link show eth0
2: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 42:a2:54:79:58:61 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 9978 
    macvlan mode bridge bcqueuelen 1000 usedbcqueuelen 1000 addrgenmode eui64 numtxqueues 5 numrxqueues 5 gso_max_size 65536 gso_max_segs 65535

# default numQueues 1
root@macvlan-subnet-50-645d8b69df-spv8j:/# iperf3 -c 172.50.0.57
Connecting to host 172.50.0.57, port 5201
[  5] local 172.50.0.66 port 50334 connected to 172.50.0.57 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.02 GBytes  8.77 Gbits/sec  730   1.15 MBytes       
[  5]   1.00-2.00   sec  1.06 GBytes  9.11 Gbits/sec  564    950 KBytes       
[  5]   2.00-3.00   sec  1.08 GBytes  9.30 Gbits/sec  325    952 KBytes       
[  5]   3.00-4.00   sec  1.08 GBytes  9.27 Gbits/sec  683    949 KBytes       
[  5]   4.00-5.00   sec  1.09 GBytes  9.41 Gbits/sec    0   1.32 MBytes       
[  5]   5.00-6.00   sec  1.08 GBytes  9.23 Gbits/sec  558    191 KBytes       
[  5]   6.00-7.00   sec  1.08 GBytes  9.31 Gbits/sec  744    981 KBytes       
[  5]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec   72    839 KBytes       
[  5]   8.00-9.00   sec  1.08 GBytes  9.31 Gbits/sec  690   1.42 MBytes       
[  5]   9.00-10.00  sec  1.05 GBytes  9.05 Gbits/sec  443   1003 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.7 GBytes  9.22 Gbits/sec  4809             sender
[  5]   0.00-10.04  sec  10.7 GBytes  9.18 Gbits/sec                  receiver

iperf Done.

# change numRxQueues/numTxQueues to 5
root@macvlan1-numqueues-d967ddf56-mlj5x:/# iperf3 -c 172.50.0.69
Connecting to host 172.50.0.69, port 5201
[  5] local 172.50.0.65 port 35034 connected to 172.50.0.69 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.09 GBytes  9.37 Gbits/sec  680   1.27 MBytes       
[  5]   1.00-2.00   sec  1.08 GBytes  9.28 Gbits/sec  1262   1.06 MBytes       
[  5]   2.00-3.00   sec  1.09 GBytes  9.41 Gbits/sec  855    666 KBytes       
[  5]   3.00-4.00   sec  1.08 GBytes  9.25 Gbits/sec  1367   1001 KBytes       
[  5]   4.00-5.00   sec  1.09 GBytes  9.36 Gbits/sec  353   1.23 MBytes       
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec  576   1.00 MBytes       
[  5]   6.00-7.00   sec  1.09 GBytes  9.41 Gbits/sec  626   1.05 MBytes       
[  5]   7.00-8.00   sec  1.09 GBytes  9.33 Gbits/sec  487    983 KBytes       
[  5]   8.00-9.00   sec  1.09 GBytes  9.37 Gbits/sec  129   1.22 MBytes       
[  5]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec  184    949 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.9 GBytes  9.36 Gbits/sec  6519             sender
[  5]   0.00-10.02  sec  10.9 GBytes  9.34 Gbits/sec                  receiver

iperf Done.

I've marked the status of the PR as WIP, and I will further confirm whether this change is meaningful. I apologize again for this no-good PR!

s1061123 commented 7 months ago

I'd recommend you to close this PR and issue #985 once and file new PR again when you're ready. Closing PR and closing issue is not penalty to you also us ;)

containernetworking / plugins

allow configuring numTxQueues and numRxQueues for macvlan and ipvlan #986