All the data loaders use the pin_memory=True, which can work only if the data is stored on CPUs.
If the entire data were stored on the GPU, pin_memory=True would make the code crash
Since all our data is stored on CPU, the max_deg and deg variables are also stored on the CPU. Therefore, the NCCL gathering operations crash because they require the variables to be on the device.
All the data loaders use the
pin_memory=True
, which can work only if the data is stored on CPUs. If the entire data were stored on the GPU,pin_memory=True
would make the code crashSince all our data is stored on CPU, the
max_de
g anddeg
variables are also stored on the CPU. Therefore, the NCCL gathering operations crash because they require the variables to be on the device.This PR fixed this problem.