containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.23k stars 789 forks source link

Fix inadvertent txqueuelen being set to zero #1100

Closed gudmundur closed 1 month ago

gudmundur commented 2 months ago

When diagnosing networking issues, I noticed that txqueuelen on veth pairs and tap devices was set to zero. In investigating the issue, I found that most usages of netlink.LinkAttrs in this repository uses the struct literal, which in the netlink docs is advised against unless TxQLen is explicitly set.

Note NewLinkAttrs constructor, it sets default values in structure. For now it sets only TxQLen to -1, so kernel will set default by itself. If you're using simple initialization(LinkAttrs{Name: "foo"}) TxQLen will be set to 0 unless you specify it like LinkAttrs{Name: "foo", TxQLen: 1000}. (link)

When using the ip command directly without specifying txqueuelen, Linux will automatically to 1000:

$ ip tuntap add tap0 mode tap
$ ip link show tap0
4: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8a:06:b2:30:4d:1e brd ff:ff:ff:ff:ff:ff

This change goes through every single usage of netlink.LinkAttrs{ and replaces it with the netlink.NewLinkAttrs constructor, and then overrides the individual fields in the struct.

I even noticed that in the bridge plugin, this is somewhat called out when using the literal. I've replaced this explicitly in d376c8df5088e62c3a993ce14e5653b7d37407a0 though.

gudmundur commented 2 months ago

Been trying to sort out what's up with the failing sbr test and have been unable to reproduce. I see that it's also failing on another recent pull request (https://github.com/containernetworking/plugins/pull/1099, failing test). Happy to provide a fix if I had a way to reproduce the failure. 😭

squeed commented 2 months ago

Thanks for the PR. Have you seen #1097? Are these two PRs duplicate or complementary?

gudmundur commented 2 months ago

@squeed The second issue #1097 describes is somewhat related, but it misses the mark on setting it to zero meaning that the kernel will pick a default.

This can be illustrated as such:

$ ip link add dev vm1 type veth
$ ip link show vm1
... snip snip snip ...
20: vm1@vm2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 76:8b:12:5d:e9:c0 brd ff:ff:ff:ff:ff:ff

Note the qlen 1000 on the interface. Now if we run the same command, but set txqueuelen 0 the following happens:

$ ip link add dev vm1 txqueuelen 0 type veth
$ ip link show vm1
24: vm1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 6a:19:4e:0b:3a:5e brd ff:ff:ff:ff:ff:ff

Note the missing qlen 1000. If you look at any regular device that is created by ip they will always have a qlen 1000. The netlink library propagates the 0 to the syscall when creating the link. I remember before fixing this, that we would see a handful of these log lines in the kernel logs. In digging a little through the Linux codebase it seems that a value of zero is caught in some places. In Googling this a little more, I also found that the Cilium folks had this issue in the past.

What I did notice on our end when we worked around this issue, was that we saw less tx packet drops. Now I can't remember the specifics on whether that was on veth or tap.

squeed commented 1 month ago

(@LionelJouin found the cause of the flake, he should have a fix soon)

squeed commented 1 month ago

Rebased to pick up SBR fixes.

gudmundur commented 1 month ago

@squeed Took care of the squashing for you. Reworded the commit message for a single commit as well.