Hi sjeaugey,
I'm reading the latest NCCL code, I found that we have classified different cases in tranpsort/net.cc to make code more simple. But it make me more confused, especially about the "bank" in the code. Can you help to explain why we can do things like below?
struct ncclSendMem sendMem = (struct ncclSendMem) ((((map)->offsets.sendMem >> 29) == 0) ? __null : (map)->mems[((map)->offsets.sendMem >> 30)].gpuPtr + ((map)->offsets.sendMem & 0x1fffffff));
It seems we add some offset on the "gpuPtr" which was allocated or reserved from cuda APIs.
Hi sjeaugey, I'm reading the latest NCCL code, I found that we have classified different cases in tranpsort/net.cc to make code more simple. But it make me more confused, especially about the "bank" in the code. Can you help to explain why we can do things like below? struct ncclSendMem sendMem = (struct ncclSendMem) ((((map)->offsets.sendMem >> 29) == 0) ? __null : (map)->mems[((map)->offsets.sendMem >> 30)].gpuPtr + ((map)->offsets.sendMem & 0x1fffffff));
It seems we add some offset on the "gpuPtr" which was allocated or reserved from cuda APIs.
Best Regards, -Edda