Open annaa-ka opened 9 months ago
Hi! Can you please answer some questions?
We are trying to optimize latency for Trees on big messages (4Gb) by changing NCCL_BUFFSIZE and chunkSizes
- I looked at latency here Is it calculated for 256 kB ChunkSize and 4Mb BuffSize?
- What does baseLat here refer to?
- Also here we see how time is calculated https://github.com/NVIDIA/nccl/blob/master/src/graph/tuning.cc#L396. BW is busBW * ratio, but ncc-tests PERFORMANCE.md claims that busBW id similar for all algorithms. But here we see different bw for different algorithms, which idea is behind it?
Hello @annaa-ka
I'm also facing issue with NCCL_BUFFSIZE
when transmitting large messages.
Is there and shareable progress or insights?
Hi! Can you please answer some questions?
We are trying to optimize latency for Trees on big messages (4Gb) by changing NCCL_BUFFSIZE and chunkSizes
I looked at latency here Is it calculated for 256 kB ChunkSize and 4Mb BuffSize?
What does baseLat here refer to?
Also here we see how time is calculated https://github.com/NVIDIA/nccl/blob/master/src/graph/tuning.cc#L396. BW is busBW * ratio, but ncc-tests PERFORMANCE.md claims that busBW id similar for all algorithms. But here we see different bw for different algorithms, which idea is behind it?