Mellanox / k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin
Apache License 2.0
109 stars 27 forks source link

an error "No such device" is reported, when using hca mode with RoCE adapter #33

Closed goversion closed 3 years ago

goversion commented 3 years ago

Environment:

What happened: Details are as follows:

Run two pods for testing, their names is test-hca1 and test-hca2.

  1. execute lspic in pod/test-hca1 , the RoCE device can be found.

    $ lspci -v | grep Mella
    5e:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
    5e:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
  2. then, execute rdma_server in pod/test-hca1 , and execute rdma_client in pod/test-hca2 , an error "No such device" is reported.

$ kubectl exec -it test-hca1 -- rdma_server
rdma_server: start
$ kubectl exec -it test-hca2 -- rdma_client -s 10.244.0.8
rdma_client: start
rdma_create_ep: No such device
rdma_client: end -1
  1. Next,execute show_gids in pod/test-hca1 , gid of RoCE device cannot be found.
    
    $ kubectl exec -it test-hca1 bash
    $ show_gids
    DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
    ---     ----    -----   ---                                     ------------    ---     ---
    n_gids_found=0

the yaml file used in the above test is as follows:

apiVersion: v1 kind: Pod metadata: name: test-hca1 spec: containers:

The Dockerfile of docker image used in the above test is : https://github.com/Mellanox/mofed_dockerfiles/blob/master/Dockerfile.centos7.2.mofed-4.4