Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
109 stars 27 forks source link

VF is not allocated after recreating a pod #27

Closed hassiweb closed 4 years ago

hassiweb commented 4 years ago

I'm facing a similar issue to #14. The device plugin worked once, but the device plugin has not allocated a VF to a pod due to insufficient rdma/vhca after recreating the pod. Then, I tried to disable and enable SR-IOV and to reload the driver, but it doesn't work.

Here is the log on the device plugin.

$ kubectl logs --namespace=kube-system rdma-sriov-dp-ds-9t5gs
2020/03/05 08:07:13 Starting K8s RDMA SRIOV Device Plugin version= 0.2
2020/03/05 08:07:13 Starting FS watcher.
2020/03/05 08:07:13 Starting OS watcher.
2020/03/05 08:07:13 Reading /k8s-rdma-sriov-dev-plugin/config.json
2020/03/05 08:07:13 loaded config:  {"mode":"sriov","pfNetdevices":["enp96s0f0"]}
2020/03/05 08:07:13 sriov device mode
Configuring SRIOV on ndev= enp96s0f0 9
max_vfs =  4
cur_vfs =  4
vf = &{2 virtfn2 true false}
vf = &{0 virtfn0 false false}
Fail to config vfs for ndev = enp96s0f0
Fail to configure sriov; error =  Link not found
2020/03/05 08:07:13 Starting to serve on /var/lib/kubelet/device-plugins/rdma-sriov-dp.sock
2020/03/05 08:07:13 Registered device plugin with Kubelet
exposing devices:  []

Kubernetes version is

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

The device plugin would be the latest version. But I don't know why the digest ID is not different from one on Dockerhub and it doesn't start when indicating the same digest ID on Dockerhub in the manifest.

docker.io/rdma/k8s-rdma-sriov-dev-plugin:latest                                                                  application/vnd.docker.distribution.manifest.list.v2+json sha256:9071e25d277d2c4cdb83443d57fecf0d98fe1d49b8bd873ba3d6eda131d12181 25.9 MiB  linux/amd64,linux/ppc64le                                   io.cri-containerd.image=managed  
moshe010 commented 4 years ago

Please use [1] for SR-IOV is support now RDMA. This project is deprecated we will update it soon

[1] - https://github.com/intel/sriov-network-device-plugin

hassiweb commented 4 years ago

Thank you for the information. I'll try soon!