Mellanox / k8s-rdma-shared-dev-plugin

Other
199 stars 35 forks source link

rdma-shared-device-plugin device isolation #93

Open cairong-ai opened 10 months ago

cairong-ai commented 10 months ago

I have 4 InfiniBand (IB) network cards on my server, and I have shared one of them with the following configuration:

rdmaSharedDevicePlugin:
  deploy: true
  resources:
    - name: rdma_shared_device_a
      ifNames: [ibs1]

However, when I check inside the Pod that has requested the rdma_shared_device_a resource, I can still see all 4 IB network cards. It seems like device isolation is not being achieved. What should I do?

adrianchiris commented 10 months ago

Hi,

I can still see all 4 IB network cards what do you mean ? can you add an example ?

generallly rdma shared device plugin will mount rdma ULP devices to container (under /dev/infiniband) for the specified device.

the "mlx5_0" rdma devices under /sys/class/infiniband are visible within container but only the relevant device should have its upper-layer-protocol(ULP) char devices exposed to container.

cairong-ai commented 10 months ago
$ kubectl logs -f rdma-shared-dp-ds-hp2z8 -n network-operator
Defaulted container "rdma-shared-dp" out of: rdma-shared-dp, ofed-driver-validation (init)
2024/01/07 06:01:53 Starting K8s RDMA Shared Device Plugin version= master
2024/01/07 06:01:53 resource manager reading configs
2024/01/07 06:01:53 Reading /k8s-rdma-shared-dev-plugin/config.json
Using Kubelet Plugin Registry Mode
2024/01/07 06:01:53 loaded config: [{ResourceName:rdma_shared_device_a ResourcePrefix: RdmaHcaMax:10 Devices:[] Selectors:{Vendors:[] DeviceIDs:[] Drivers:[] IfNames:[ibs1] LinkTypes:[]}} {ResourceName:rdma_shared_device_b ResourcePrefix: RdmaHcaMax:10 Devices:[] Selectors:{Vendors:[] DeviceIDs:[] Drivers:[] IfNames:[ibs8] LinkTypes:[]}}]
2024/01/07 06:01:53 no periodic update interval is set, use default interval 60 seconds
2024/01/07 06:01:53 Discovering host devices
2024/01/07 06:01:53 discovering host network devices
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:14:00.0   02              Mellanox Technolo...    MT28908 Family [ConnectX-6]
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:30:00.0   02              Mellanox Technolo...    MT28908 Family [ConnectX-6]
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:68:00.0   02              Intel Corporation       Ethernet Controller X710 for 10GbE SFP+
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:68:00.1   02              Intel Corporation       Ethernet Controller X710 for 10GbE SFP+
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:b9:00.0   02              Mellanox Technolo...    MT28908 Family [ConnectX-6]
2024/01/07 06:01:53 DiscoverHostDevices(): device found: 0000:d2:00.0   02              Mellanox Technolo...    MT28908 Family [ConnectX-6]
2024/01/07 06:01:53 Initializing resource servers
2024/01/07 06:01:53 Resource: &{ResourceName:rdma_shared_device_a ResourcePrefix:rdma RdmaHcaMax:10 Devices:[] Selectors:{Vendors:[] DeviceIDs:[] Drivers:[] IfNames:[ibs1] LinkTypes:[]}}
2024/01/07 06:01:53 error creating new device: "missing RDMA device spec for device 0000:68:00.0, RDMA device \"issm\" not found"
2024/01/07 06:01:53 error creating new device: "missing RDMA device spec for device 0000:68:00.1, RDMA device \"issm\" not found"
2024/01/07 06:01:54 Resource: &{ResourceName:rdma_shared_device_b ResourcePrefix:rdma RdmaHcaMax:10 Devices:[] Selectors:{Vendors:[] DeviceIDs:[] Drivers:[] IfNames:[ibs8] LinkTypes:[]}}
2024/01/07 06:01:54 error creating new device: "missing RDMA device spec for device 0000:68:00.0, RDMA device \"issm\" not found"
2024/01/07 06:01:54 error creating new device: "missing RDMA device spec for device 0000:68:00.1, RDMA device \"issm\" not found"
2024/01/07 06:01:54 Starting all servers...
2024/01/07 06:01:54 starting rdma/rdma_shared_device_a device plugin endpoint at: rdma_shared_device_a.sock
2024/01/07 06:01:54 rdma/rdma_shared_device_a device plugin endpoint started serving
2024/01/07 06:01:54 starting rdma/rdma_shared_device_b device plugin endpoint at: rdma_shared_device_b.sock
2024/01/07 06:01:54 rdma/rdma_shared_device_b device plugin endpoint started serving
2024/01/07 06:01:54 All servers started.
2024/01/07 06:01:54 Listening for term signals

The above is the log of the rdma-shared-dev-plugin pod. rdma_shared_device_a only specified one IB card, but in the Pod that was allocated the rdma_shared_device_a resource, four IB cards can be seen under /sys/class/infiniband

# ll /sys/class/infiniband
total 0
drwxr-xr-x  2 root root 0 Jan  7 07:36 ./
drwxr-xr-x 83 root root 0 Jan  7 07:36 ../
lrwxrwxrwx  1 root root 0 Jan  7 07:36 mlx5_0 -> ../../devices/pci0000:09/0000:09:02.0/0000:0a:00.0/0000:0b:08.0/0000:12:00.0/0000:13:00.0/0000:14:00.0/infiniband/mlx5_0/
lrwxrwxrwx  1 root root 0 Jan  7 07:36 mlx5_1 -> ../../devices/pci0000:1a/0000:1a:02.0/0000:1b:00.0/0000:1c:08.0/0000:2e:00.0/0000:2f:00.0/0000:30:00.0/infiniband/mlx5_1/
lrwxrwxrwx  1 root root 0 Jan  7 07:36 mlx5_2 -> ../../devices/pci0000:b0/0000:b0:02.0/0000:b1:00.0/0000:b2:04.0/0000:b7:00.0/0000:b8:10.0/0000:b9:00.0/infiniband/mlx5_2/
lrwxrwxrwx  1 root root 0 Jan  7 07:36 mlx5_3 -> ../../devices/pci0000:c9/0000:c9:02.0/0000:ca:00.0/0000:cb:04.0/0000:d0:00.0/0000:d1:10.0/0000:d2:00.0/infiniband/mlx5_3/
adrianchiris commented 10 months ago

in the pod, what do you see under /dev/infiniband/ ?

cairong-ai commented 10 months ago
# ll /dev/infiniband/
total 0
drwxr-xr-x 2 root root      120 Jan  7 07:35 ./
drwxr-xr-x 6 root root      480 Jan  7 07:35 ../
crw------- 1 root root 231,  65 Jan  7 07:35 issm1
crw-rw-rw- 1 root root  10,  58 Jan  7 07:35 rdma_cm
crw------- 1 root root 231,   1 Jan  7 07:35 umad1
crw-rw-rw- 1 root root 231, 193 Jan  7 07:35 uverbs1

/dev/infiniband in the pod, the content is as above

adrianchiris commented 10 months ago

according to your feedback, rdma shared device plugin behaves as expected.

the reason why you see all mlx_* devices under /sys/class/infiniband is because kernel does not namespace them. however only one device is actually accessible from container as you only have mounts under /dev/infiniband from ibs1 device

cairong-ai commented 10 months ago

Thanks,Is there any way to make sure that a Pod can only see the allowed IB cards? @adrianchiris