Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
109 stars 27 forks source link

Configure ib0 for overlay/virtual netdevice #16

Open j0hnL opened 5 years ago

j0hnL commented 5 years ago

Can you elaborate on the steps necessary to "...configure ib0 or appropriate IPoIB netdevice as the parent netdevice for creating overlay/virtual netdevices."?

Is this supposed to work if you have multiple networks on the host compute nodes? For instance, my k8s runs over ethernet and I have IB installed on a few of the compute nodes. Am I able to launch pods and use the IB network between pods if it is not the default? Is it possible to change the CNI to default to an Infiniband if present?

paravmellanox commented 5 years ago

@j0hnL, in sriov mode, each container gets its own sriov based accelerated netdev (non overlay netdev). This is auto configured by this device plugin already.

if you are using shared mode of this device plugin, and intent to use overlay network such as calico, contiv etc CNI, than specify ib0 as parent interface in one of those plugin of choice. There is no specific ib0 configuration needed other than interface being up and right size mtu in your network. Default mtu should just work too. In both the modes pods on different and same host will be communicate with each other using IP packets (IPoIB) or overlay/encapulated packets over IPoIB ib0.

I didn't follow your question about CNI to default on ib0. I didn't follow your question

j0hnL commented 5 years ago

That's interesting, I guess I misunderstood how this is supposed to work. My hope was to use this plugin with Flannel to share the HCA with pods that use MPI to communicate over native InfiniBand, rather than IPoIB. Is there a measured benefit to using sriov mode versus just configuring your host adapter to use IPoIB? Thanks for the insight.

paravmellanox commented 5 years ago

@j0hnL, what you explained should work well in shared mode. Some of the other users at present are ok. to use it too. No sr-iov is almost at par with non sriov mode in terms of performance. It really depends on your use case/how you plan to deploy, how users intent to use this.

If you are using rdmacm for establishing rdma connection between MPI processes, than shared mode cannot be used. You need individual rdma device and its upper IPoIB device. MPI has some environment flag to enable/disable rdmacm. I am not right person to dig that flag. You might find it or might already know. If you are interested in per rdma device statistics in future, you should consider moving to SR-IOV mode.