flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.77k stars 2.87k forks source link

Loss of performance 86% on 10Gbit links #1234

Closed strigazi closed 1 year ago

strigazi commented 4 years ago

When running iperf3 on the VMs performance is 9.40 gbps while when running iperf3 on pods which runs on the same nodes performance is 1.31 gbps. I tried another CNI and I have negligible performance loss.

Kubernetes is installed by hand. Flannel is installed using: https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml net.bridge.bridge-nf-call-iptables is set to 1

Your Environment

leezout commented 4 years ago

Same problem here.

xvzf commented 4 years ago

Can you share more info on your hardware?

leezout commented 4 years ago

Can you share more info on your hardware?

Server A (data server): 2x Intel Gold 5218 @ 2.30GHz 128GB RAM NIC 10GbE MTU 9000

Server B (k8 node): 2x Intel Xeon Gold 6128@ 3.40GHz 256GB RAM NIC 2x 10GbE MTU 9000 bonded in Balance Round Robin

I have noticed if I remove the bonded NIC on my k8 node, I get results close to 10Gbps but if I use the bonding mode 0, I get around 3Gbps when I run iperf3.

xvzf commented 4 years ago

Can you try bonding the interface via 802.1ad ? (you can do that pretty easily via iproute2) I tried to replicate your issues on my test cluster (slower CPUs but 2x 10G Intel Nics) but there is no sign of bottlenecking.

strigazi commented 4 years ago

Thanks @xvzf , are you using vxlan?

xvzf commented 4 years ago

@strigazi yes I do

strigazi commented 4 years ago

Can you share more info on your hardware?

I'm in VMs with linux bridge.

16 cores 120000 RAM (one full numa node Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz ). One Intel Corporation I350 Gigabit Network Connection (rev 01) per VM. mtu 1500

@xvzf what numbers do you get with iperf?

xvzf commented 4 years ago

@strigazi Do you have any monitoring on your cluster (more specific CPU Resources)? But my nics (Intel X520) should be doing VXLAN offloading!

I am getting 9.6x Gbps between two containers

strigazi commented 4 years ago

@strigazi Do you have any monitoring on your cluster (more specific CPU Resources)? But my nics (Intel X520) should be doing VXLAN offloading!

I am getting 9.6x Gbps between two containers

Are you in VMs or baremetal?

xvzf commented 4 years ago

@strigazi Bare metal

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.