FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.14k stars 779 forks source link

Asynchronous Federated Learning #342

Open xuannn1998 opened 2 years ago

xuannn1998 commented 2 years ago

Hi there,

Are there any implementations related to asynchronous FL? Especially the state-of-the-art paper: Asynchronous Federated Optimization

FedML-AI-admin commented 2 years ago

@xuannn1998 good questions. We will release our LightSecAgg (MLSys 2022) soon.

https://proceedings.mlsys.org/paper/2022/file/d2ddea18f00665ce8623e36bd4e3c7c5-Paper.pdf

xuannn1998 commented 2 years ago

Current FedAvg algorithm is done by sequential training (simulation with a single process). In order to achieve asynchronous FL, is it possible to change training phase to parallel?

chaoyanghe commented 2 years ago

@xuannn1998 Please check this example, we do support simulation with sequential distributed training: python/examples/simulation/mpi_torch_fedavg_seq/README.md

as for asynchronous FL, it's better to run it in real-world system. Do you hope to support async FL in simulation or real-world cross-silo?

fedml-dimitris commented 10 months ago

@xuannn1998 We will revisit this soon.