IBM / federated-learning-lib

A library for federated learning (a distributed machine learning process) in an enterprise environment.
Other
493 stars 134 forks source link

curious about how perc_quorum works #49

Open Enrique-Marmol opened 3 years ago

Enrique-Marmol commented 3 years ago

Hi everyone,

I would like to know how perc_quorum works. I mean, if I set it to 0.75, then if I have 4 parties the procces will run as long as 3 at least parties complete their training in the time established. So my doubt is what happend with if one party did not acomplish to finish in time, does it continue with their training and when it finish wait until the end of the round? or does it started again from the begining?

thank you in advance.

chalianwar commented 3 years ago

Hi,

Thanks for checking out our FL library!

In your example, perc_quorum when set to 0.75 indeed means that training will continue as long as three out of four parties (75%) reply back.

In the current version, Aggregator moves the training to the next round when the quorum is met. The max_timeout is used to specify how long Aggregator should wait for the quorum to meet. Drawback of this approach is that a party which replies significantly late compared to the rest of the parties is ignored if quorum was met before the reply was received.

We are working on changing this behavior. In the next release, this behavior will be changed such that the Aggregator will wait for all parties to reply for the set amount of time, max_timeout. If some party/parties do not reply in that period of time Aggregator will verify if quorum is met or not to decide either to continue or finish the training process.

Stay tuned for the next release!