Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
109 stars 27 forks source link

some question about HCA mode #11

Closed tingweiwu closed 5 years ago

tingweiwu commented 6 years ago

1、what the meaning of pfNetdevices ? the rdma-hca-node-config.yml don't have this field

2、when I deploy plugin as the following:

kubectl create -f example/hca/rdma-hca-node-config.yaml
kubectl create -f example/device-plugin.yaml
kubectl create -f example/hca/test-hca-pod.yaml

I see the logs from test pod

[root@Mellanox]# kubectl logs mofed-test-pod
/dev/infiniband:
total 0
crw-------. 1 root root 231,  64 Sep  3 12:37 issm0
crw-rw-rw-. 1 root root  10,  57 Sep  3 12:37 rdma_cm
crw-rw-rw-. 1 root root 231, 224 Sep  3 12:37 ucm0
crw-------. 1 root root 231,   0 Sep  3 12:37 umad0
crw-rw-rw-. 1 root root 231, 192 Sep  3 12:37 uverbs0

/sys/class/net:
total 0
-rw-r--r--. 1 root root 4096 Sep  3 12:37 bonding_masters
lrwxrwxrwx. 1 root root    0 Sep  3 12:37 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx. 1 root root    0 Sep  3 12:37 lo -> ../../devices/virtual/net/lo
lrwxrwxrwx. 1 root root    0 Sep  3 12:37 tunl0 -> ../../devices/virtual/net/tunl0

it is the output of test-pod right?

here is my node info

Capacity:
 cpu:                56
 ephemeral-storage:  569868560Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             527877808Ki
 nvidia.com/gpu:     8
 pods:               110
 rdma/hca:           1k
Allocatable:
 cpu:                56
 ephemeral-storage:  525190864027
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             527775408Ki
 nvidia.com/gpu:     8
 pods:               110
 rdma/hca:           1k

3、In my host node. I can use Infiniband to set with ib0. Except set rdma/hca: 1, how can I use Infiniband in a pod , as I don't see ib0 from the logs of test-hca-pod

Appreciate if you can give some suggestions

paravmellanox commented 5 years ago

@tingweiwu sorry for the late response. I was off last week for sometime and was occupied.

Your configuration in 2 and 3 looks good. Regarding 1st question, hca mode doesn't have concept of pfNetdevices, because its hca shared mode among containers. In shared mode ib0 device is not available to container. You should use any other overlay driver such as Contiv, Calico from https://github.com/containernetworking/cni In this parent device should be ib0 so that created virtual overlay network devices are on top of ib0 physical device.

tingweiwu commented 5 years ago

@paravmellanox
Thanks a lot for your reply. For 1st question before. I think this guidance document may confused me. so I have this question. now I got it. image

Another question additionally, I use calico as my K8s cluster CNI plugin. and after I deploy rdma-hca device plugin. it seems working well. as I can execute ib_devinfo and other ib commonds in the pod successfly.

/examples#  ibstat
CA 'mlx5_0'
    CA type: MT4115
    Number of ports: 1
    Firmware version: 12.23.1000
    Hardware version: 0
    Node GUID: 0x506b4b030035efee
    System image GUID: 0x506b4b030035efee
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 100
        Base lid: 250
        LMC: 0
        SM lid: 1
        Capability mask: 0x2650e848
        Port GUID: 0x506b4b030035efee
        Link layer: InfiniBand

now I am running a mpi application which will use RMDA. and I get this error Call to ibv_create_qp failed. I have searched online but haven't get an idea about this. do you know the possible reason about this?

worker-0:17:76 [0] INFO NCCL_SINGLE_RING_THRESHOLD=262144
worker-0:17:76 [0] INFO Ring 00 :    0   1
worker-0:17:76 [0] INFO 1 -> 0 via NET/IB/0/GDRDMA
worker-1:17:76 [0] INFO 0 -> 1 via NET/IB/0/GDRDMA

worker-0:17:76 [0] misc/ibvwrap.cu:275 WARN Call to ibv_create_qp failed
worker-0:17:76 [0] INFO transport/net_ib.cu:354 -> 2
worker-0:17:76 [0] INFO transport/net_ib.cu:432 -> 2
worker-0:17:76 [0] INFO include/net.h:32 -> 2 [Net]
worker-0:17:76 [0] INFO transport/net.cu:266 -> 2
worker-0:17:76 [0] INFO init.cu:475 -> 2
worker-0:17:76 [0] INFO init.cu:536 -> 2

worker-1:17:76 [0] misc/ibvwrap.cu:275 WARN Call to ibv_create_qp failed
worker-1:17:76 [0] INFO transport/net_ib.cu:354 -> 2
worker-1:17:76 [0] INFO transport/net_ib.cu:432 -> 2
worker-1:17:76 [0] INFO include/net.h:32 -> 2 [Net]
worker-1:17:76 [0] INFO transport/net.cu:266 -> 2
worker-1:17:76 [0] INFO init.cu:475 -> 2
worker-1:17:76 [0] INFO init.cu:536 -> 2

the same mpi application run successffly when I have run it in docker with hostnetwork,

tingweiwu commented 5 years ago

@paravmellanox Call to ibv_create_qp failed solved by set ulimit -l unlimited in pod spec

paravmellanox commented 5 years ago

@tingweiwu I will fix the documentation and update you.

paravmellanox commented 5 years ago

@tingweiwu you should add annotations to pod spec file like below for containers. securityContext: capabilities: add: [ "IPC_LOCK" ]

Please refer to this sample. https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/blob/master/example/sriov/test-sriov-pod.yaml

xieydd commented 5 years ago

@tingweiwu I meet the same problem, the job will hang, right?