FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.17k stars 784 forks source link

FedOpt for cross-silo #1818

Open kamelferrahi opened 9 months ago

kamelferrahi commented 9 months ago

I would like to ask if Fedopt is only available in the simulation mode? Is there any way to use it in the cross-silo mode with MQTT as FedAvg. Thanks

fedml-dimitris commented 9 months ago

Hello @kamelferrahi, have you tried using FedOpt: https://github.com/FedML-AI/FedML/blob/master/python/examples/federate/cross_silo/mpi_fedavg_mnist_lr_example/config/fedml_config.yaml#L15

in a configuration file like the one below: https://github.com/FedML-AI/FedML/blob/master/python/examples/federate/cross_silo/mqtt_s3_fedavg_mnist_lr_example/step_by_step/config/fedml_config.yaml

and it did not work?

kamelferrahi commented 9 months ago

Hi @fedml-dimitris , Yes using FedOpt like in : https://github.com/FedML-AI/FedML/tree/master/python/examples/federate/prebuilt_jobs/fedgraphnn/moleculenet_graph_clf/config_fedopt/simulation_gcn in the simulation mode using MPI backend works flawlessly However using it in a crosss-silo environnement like in: https://github.com/FedML-AI/FedML/blob/master/python/examples/federate/cross_silo/mqtt_s3_fedavg_mnist_lr_example/step_by_step/config/fedml_config.yaml by replacing the FedAvg by FedOpt cause an Exception

Does FedOpt work only in the simulation mode?