Closed chaoyanghe closed 3 years ago
Could you add more details? Which part of the code is corresponding to the DP here?
No worries, this is a framework level issue, I will move it to FedML project.
DP in FL is equal to parallel training in multiple GPUs. I think the issue happens when we assign multiple processes in a single GPU. FedML will collaborate with PyTorch team to systematicallly optimize this contention issue in CUDA primitive level and also check if some implemention in Python code should be adjusted.
...