cncf / cluster

🖥🖥🖥🖥CNCF Community Cluster
https://cncf.io/cluster
153 stars 42 forks source link

Integrate hardware offload with Kube-OVN #118

Closed oilbeater closed 2 years ago

oilbeater commented 4 years ago

Please fill out the details below to file a request for access to the CNCF Community Infrastructure Lab. Please note that access is targeted to people working on specific open source projects; this is not designed just to get your feet wet. The most important answer is the URL of the project you'll be working with. If you're looking to learn Kubernetes and related technologies, please try out Katacoda.

First and Last Name

Mengxin Liu

Email

mengxin@alauda.io

Company/Organization

Alauda Inc

Job Title

Senior Engineer

Project Title (i.e., summary of what do you want to do, not what is the name of the open source project you're working with)

Integrate hardware offload with Kube-OVN

Briefly describe the project (i.e., what is the detail of what you're planning to do with these servers?)

Kube-OVN is a CNI implementation based on ovs. We see huge throughput loss and high cpu usage when processing small size packet or when flow rules number increase. After some investigation,we found that DPDK and hardware offload are ways that can greatly improve ovs performance. In this request we'd like to implement hardware offload first to see how much we can imporve Kube-OVN performance by take advantage of new hardware. As most CNI implementation are still using Linux network stack technology to do container network, this attempt can also provide new hint about how we can take more hardware technology to further improve container network performance.

Is the code that you’re going to run 100% open source? If so, what is the URL or URLs where it is located? What is your association with that project?

Yes its 100% open source code that we will running. https://github.com/alauda/kube-ovn I am the author of the project.

What kind of machines and how many do you expect to use (see: https://www.packet.com/bare-metal/)?

We need 6 node with Mellanox ConnectX-4 NIC to test the implementation and performance

What OS and networking are you planning to use (see: https://support.packet.com/kb/articles/supported-operating-systems)?

3 CentOS 7.6 and 3 Ubuntu 18.04 to make sure Kube-OVN works correctly on both OS

Any other relevant details we should know about?

If all works fine, we would like to go further to see how DPDK will affect the container network performance

dankohn commented 4 years ago

+1

@vielmetti can comment on availability.

Also, @edwarnicke may be able to offer some DPDK pointers.

vielmetti commented 4 years ago

This has a review of NIC hardware https://support.packet.com/kb/articles/networking-faq

Our c2.medium - https://www.packet.com/cloud/servers/c2-medium-epyc/ - might be a good choice. I will double check to be sure that these are all CX4. I am guessing that you would want them all in the same data center? I will check on availability.

vielmetti commented 4 years ago

There's c2.medium availability in DFW2 (Dallas) and SJC1 (San Jose), at least for the short term. How long do you anticipate your project to run?

edwarnicke commented 4 years ago

@oilbeater If you can say a bit more about what you are looking to do with DPDK for HW offload, I might be able to offer some words of advice or point to some other helpful folks :)

taylorwaggoner commented 4 years ago

@oilbeater I've created the project in Packet and have sent you the invitation. Thanks!

oilbeater commented 4 years ago

@vielmetti Thanks for the quick reply. Yes, we'd like that all server in the same data center and we anticipate that we will take two weeks to run this project

oilbeater commented 4 years ago

@edwarnicke we want to try ovs-dpdk datapath to see if it is compatible with our implementation and test the performance improvement. We also curious about if ovs-dpdk can outperform other CNI implementation that relies on kernel network in throughput or latency under the Kubernetes scenario

vielmetti commented 4 years ago

Looks like a couple of machines are up and running!

edwarnicke commented 4 years ago

@oilbeater Cool, what are you looking to measure?

vielmetti commented 3 years ago

@oilbeater The project currently has 3x c3.medium systems in our Tokyo (NRT1) data center that we are in need of for another project. I'm happy to provide alternatives in that same data center, or if those machines are no longer in use, to reclaim them. thanks! Ed

github-actions[bot] commented 2 years ago

Stale issue message

vielmetti commented 2 years ago

Opening this issue briefly for notification, then closing in favor of a new one.

vielmetti commented 2 years ago

@oilbeater @edwarnicke

please see https://github.com/cncf/cluster/issues/210 - this project should have completed by now, yet there is still activity in this account.