NVIDIA / nccl-tests

NCCL Tests
BSD 3-Clause "New" or "Revised" License
905 stars 242 forks source link

question for NCCL write data size #266

Open gabbychen opened 1 week ago

gabbychen commented 1 week ago

Hi

    When I utilize NCCL for data sending/receiving
    I found the write data size to memory is doubled than the received data size
    Is there any reason for the duplicated data write?
kiskra-nvidia commented 1 week ago

Memory buffers not being registered perhaps? https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/bufferreg.html

gabbychen commented 6 days ago

Hi, I utilized nccl_test for profiling. What I found is the write data size in memory is doubled than the received data size I set in nccl_test (e.g. I set the total data size as 1 GB for AllGather profiling with 4 GPU, the write data size is 1.5 GB for in place (2 x 0.75 GB (received data) and 1.75 GB for out of place, when I utilized the same data size with 2 GPU, the write data size is 1 GB (2 x 0.5 GB (received data size) for inplace and 1.5GB for out-of-place) I thought it's caused by internal mechanism.

I wonder if it's caused by extra buffer on communication or some other reason? (1) Is it possible to remove the extra buffer copy with direct copy to get better communication performance? (2) If I remove the extra buffer copy will cause extra problem?

kiskra-nvidia commented 6 days ago

Please try the -R 1 option when using nccl-tests: https://github.com/NVIDIA/nccl-tests/blob/8dfeab9eb9bdfdf13503e71e1f33e7f8a208b540/src/common.cu#L876

gabbychen commented 5 days ago

Thanks, I will try it.