NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.28k stars 829 forks source link

How I can modify the source code to change the send data size to 16K in IB verbs? #1395

Open shanleo2024 opened 3 months ago

shanleo2024 commented 3 months ago

Hi, currentlly for simple protocol, allreduce will send 1MB date one time through IB verbs, I want to change the send data size to 16KB. I tried those ways but cannot got the expact peformance. (1) Change the NCCL_BUFFSIZE to 65535, indeed now in IB verbs it send data 16K one time. but got the poor performance, I suppose the small buff will cause kernel copy proformace worse. (2) Change the NCCL_IB_QPS_PER_CONNECTIO to 64, but still got the worse performance. (3) Change the NCCL_STEPS from 8 to 512, it will divided the 4MB simple pro buff to 8192 prt step, and in allreduce will send 2step one time. but still got worse performance.

I want to know how to change the date size sent by IB verbs to 16K, while keep the simple protocol buff 4MB. That is to say when sending data by IB verbs, how to split 1MB date to 64 times 16KB, I tried to modify the source code, but failed. Can you give me some advices? Thank you.