NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
264 stars 49 forks source link

NVlink support #174

Open ritazh opened 1 month ago

ritazh commented 1 month ago

Is there any plans for adding support for NVLink? e.g. GB200 NVL72 If so, can you share a rough example for what a typical device class and ResourceClaimTemplate might look like? Thanks!

ritazh commented 1 month ago

@klueska do you have any thoughts around this?

e.g. run nvidia-smi topo -m or nvidia-smi nvlink --status could expose the nvlink connections and the topology information for the scheduler to pick a node connected via nvlink vs not