Is there a case for using unique optimizers in federated learning?

jaintj95 commented 4 years ago

Is your feature request related to a problem? Please describe. Right now the federated optimizer API assumes that we are only going to use the same optimizer across different workers.

Describe the solution you'd like Is there a case for using different optimizers across different workers? What if the model owner wants to implement different learning rates or gradient descent algorithms? Instead of instatiating different Federated Optimizers in that case, maybe we could extend the API to support tuples of (worker, optimizer)?

NiWaRe commented 4 years ago

Hey @jaintj95

as far as I understood in Part 07 of the PySyft Tuts (in the contrary to the previous tutorial) we use a separate optimizer for each worker, simply by putting together a list of optimizers using the same model.parameters() (to train one single model for on each worker) Although not done in the example I think it is perfectly possible to implement different optimizers for each worker if wanted.

But apart from the possibility of practically doing it, does it make sense theoretically? The data on each worker are potentially non-iid and from different distribution which normally (not considering the Multi-Task-Learning setting) should be combined to describe a generalized distribution. If you alter the optimization for each worker and then update the global model using FederateAverging (using FedSGD or some Gradient Averaging wouldn't work anymore because otherwise the change in the optimizer wouldn't do anything) you would bias the update potentially more towards one worker (e.g. if the learning rate for one is bigger than for the other) ?

Actually I'm not very experienced in this topic, but just curious why this could be helpful. :)

github-actions[bot] commented 4 years ago

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.

OpenMined / PySyft

Is there a case for using unique optimizers in federated learning? #3335