Open CatalinLucian opened 2 months ago
The number of rings NCCL uses depends on your hardware topology, and how many rings it needs to reach peak bandwidth. Each ring is run by a GPU SM and can use a different path within the node to maximize the usage of HW.
Hello,
After few experiments it seems that NCCL uses a double ring topology for data transfer. Is double ring the default? Or is there an option to change to single ring topology? I am investigating different topology configurations and data transfer orders.
Regards, Catalin