Data Parallelism (DP) during training does not meet expectation (only 3 processes are working, should be FOUR), I will optimize it after ICML deadline.

FedML-AI / FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022

223 stars 45 forks source link

Data Parallelism (DP) during training does not meet expectation (only 3 processes are working, should be FOUR), I will optimize it after ICML deadline. #6

Closed chaoyanghe closed 3 years ago

chaoyanghe commented 3 years ago

...

yuchenlin commented 3 years ago

Could you add more details? Which part of the code is corresponding to the DP here?

chaoyanghe commented 3 years ago

No worries, this is a framework level issue, I will move it to FedML project.

DP in FL is equal to parallel training in multiple GPUs. I think the issue happens when we assign multiple processes in a single GPU. FedML will collaborate with PyTorch team to systematicallly optimize this contention issue in CUDA primitive level and also check if some implemention in Python code should be adjusted.