NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.27k stars 826 forks source link

Why are not all SMs active when NCCL kernel and compute kernel overlap? #1432

Open yu-depend opened 2 months ago

yu-depend commented 2 months ago

When I run a single NCCL kernel ,the active SMs is 15%,and When I run a single compute kernel ,the active SMs is 100% ,but when I run the compute kernel and the NCCL kernel in parallel, so that they overlap,the active SMs is 85%, how to explain this?

Image