Problems of distributed computing in federated learning

FedML-AI / FedCV

FedCV: An Industrial-grade Federated Learning Framework for Diverse Computer Vision Tasks

66 stars 23 forks source link

Problems of distributed computing in federated learning #36

Open rG223 opened 2 years ago

rG223 commented 2 years ago

When using distributed operation, I have four Gpus, each of which has a client. During the training process, each GPU has a huge difference. Two gpus even ran out of memory. By the way, I also found that gpu training with overflow was extremely slow and seemed to have gpu utilization close to zero.

chaoyanghe commented 2 years ago

@rG223 Please help to provide more details. Thanks.