Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
110 stars 27 forks source link

Failed to Create QP #29

Open zlwfrank opened 4 years ago

zlwfrank commented 4 years ago

I tried to deploy the rdma device plugin in HCA mode in my kubernetes cluster. I followed the instruction and the device plugin can be registered successfully. If I run "kubectl describe node [node_name]", I can find the rdma/hca resource. If I run "ibstat" in the pods, the inifiniband information shows up and the status is active/up.

However, when I tried to run a connection test using "ib_read_bw", it threw me following error: "Couldn't get device attribute. Unable to create QP. Failed to create QP. Couldn't create IB resource."

I simply run the test by running "ib_read_bw" in one pod and running "ib_read_bw [target_pod_ip_addr]" in another pod. Could anyone please help with this issue? I appreciate your help.

paravmellanox commented 4 years ago

@zlwfrank container might not have IPC_LOCK capabilities.

Refer to example here to add "IPC_LOCK" line at appropriate place.

spec: restartPolicy: OnFailure containers:

zlwfrank commented 4 years ago

@paravmellanox Thanks for the reply. Actually I was using the provided sample .yaml file and the IPC_LOCK capability had been added.

This is the file I used:

apiVersion: v1 kind: Pod metadata: name: ib-test-pod-1 spec: restartPolicy: OnFailure containers:

yh-xu commented 3 years ago

@zlwfrank have you resolved this problem? I got the same symptom of "fail to create qp" when running ib_read_bw inside container, and had no idea how to deal with.