Open ritazh opened 1 month ago
@klueska do you have any thoughts around this?
e.g. run nvidia-smi topo -m
or nvidia-smi nvlink --status
could expose the nvlink connections and the topology information for the scheduler to pick a node connected via nvlink vs not
Is there any plans for adding support for NVLink? e.g. GB200 NVL72 If so, can you share a rough example for what a typical device class and ResourceClaimTemplate might look like? Thanks!