Mirantis / virtlet

Kubernetes CRI implementation for running VM workloads
Apache License 2.0
739 stars 128 forks source link

Virtlet VM has low network throughput #882

Open keyingliu opened 5 years ago

keyingliu commented 5 years ago

We are testing the virtlet VM network throughput, and found the throughput is 2-3x slower compared to our openstack VM. The client and server are in same rack, it only has 3 to 4 Gbits/sec.

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   356 MBytes  2.99 Gbits/sec    0    269 KBytes
[  4]   1.00-2.00   sec   434 MBytes  3.64 Gbits/sec    0    308 KBytes
[  4]   2.00-3.00   sec   578 MBytes  4.85 Gbits/sec    0    308 KBytes
[  4]   3.00-4.00   sec   578 MBytes  4.85 Gbits/sec    0    277 KBytes
[  4]   4.00-5.00   sec   570 MBytes  4.78 Gbits/sec    0    283 KBytes
[  4]   5.00-6.00   sec   478 MBytes  4.01 Gbits/sec    0    266 KBytes
[  4]   6.00-7.00   sec   430 MBytes  3.60 Gbits/sec    0    272 KBytes
[  4]   7.00-8.00   sec   433 MBytes  3.63 Gbits/sec    0    277 KBytes
[  4]   8.00-9.00   sec   579 MBytes  4.86 Gbits/sec    0    286 KBytes
[  4]   9.00-10.00  sec   580 MBytes  4.87 Gbits/sec    0    286 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  4.90 GBytes  4.21 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  4.90 GBytes  4.21 Gbits/sec                  receiver

After we dig more, found the csum/tso...(run ethtool -k interface) of vm tap interface are off. By consulting libvirt/qemu source code, found we missed a flag when configuring the tap device, at code https://github.com/Mirantis/virtlet/blob/master/pkg/nettools/tap_linux.go#L58, add one more flag:

-       req.Flags = uint16(syscall.IFF_TAP | syscall.IFF_NO_PI | syscall.IFF_ONE_QUEUE)
+       req.Flags = uint16(syscall.IFF_TAP | syscall.IFF_NO_PI | syscall.IFF_ONE_QUEUE | syscall.IFF_VNET_HDR)

qemu will set the tso on if the host supports it. After the change, the throughput is same as our openstack VM.

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.01 GBytes  8.66 Gbits/sec    4    621 KBytes
[  4]   1.00-2.00   sec   961 MBytes  8.06 Gbits/sec    9    923 KBytes
[  4]   2.00-3.00   sec  1.09 GBytes  9.34 Gbits/sec    0    943 KBytes
[  4]   3.00-4.00   sec  1.09 GBytes  9.40 Gbits/sec    0    943 KBytes
[  4]   4.00-5.00   sec  1.09 GBytes  9.33 Gbits/sec    0    949 KBytes
[  4]   5.00-6.00   sec  1.05 GBytes  9.01 Gbits/sec    0   1018 KBytes
[  4]   6.00-7.00   sec   901 MBytes  7.56 Gbits/sec    4   1.23 MBytes
[  4]   7.00-8.00   sec   940 MBytes  7.89 Gbits/sec    4   1.46 MBytes
[  4]   8.00-9.00   sec  1.00 GBytes  8.61 Gbits/sec  200   1.15 MBytes
[  4]   9.00-10.00  sec   968 MBytes  8.12 Gbits/sec    0   1.38 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  10.0 GBytes  8.60 Gbits/sec  221             sender
[  4]   0.00-10.00  sec  10.0 GBytes  8.59 Gbits/sec                  receiver