NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.24k stars 818 forks source link

Does FabricManager support isolation at GPU card granularity in MNNVL envrionmant? #1475

Open hailiyidishui opened 1 month ago

hailiyidishui commented 1 month ago

@AddyLaddy Does FM support partitioning one compute node for multi-tenant? For example, I have 4 GPUs in one compute node, but I want to partition them into 2 nvlink groups, however they can't communicate between these 2 groups with NVLink.

AddyLaddy commented 1 month ago

I don't believe so on current NVSwitch based HW. But the MNNVL product (GB200 NVL72) has yet to be released and I'm not sure what partitioning options that will provide.