Simulate collaborative ML scenarios, experiment multi-partner learning approaches and measure respective contributions of different datasets to model performance.
With the current implementations of the various mpls, the computational time is not at all related to a true federated scenario, for two main reasons:
the partners cannot train in parallel.
there is no communication between the "central server" or the partners.
I think it's in fact possible to compute a approximation of the federated training time of a scenario, by taking into account the parallelism of the local training and the communication time cost
the time of a global batch would be the max of the actual time for each sequential local training for this global batch)
$t_{global batch i} = maxp \sum{local batch j in global batch i for partner p} t_{local batch j}$
The communication time would be an arbitrary "communication time" multiplied by the number of communication, which is algorithm dependent. For fedavg, it would be equal to the number of globalbatch/minibatch for the training, + the ones needed for the initialization/test (which can maybe be neglected ?)
With the model the amount of data exchanged is not really taking into account, I don't know if it's mandatory
With the current implementations of the various mpls, the computational time is not at all related to a true federated scenario, for two main reasons:
I think it's in fact possible to compute a approximation of the federated training time of a scenario, by taking into account the parallelism of the local training and the communication time cost
the time of a global batch would be the max of the actual time for each sequential local training for this global batch)
$t_{global batch i} = maxp \sum{local batch j in global batch i for partner p} t_{local batch j}$
The communication time would be an arbitrary "communication time" multiplied by the number of communication, which is algorithm dependent. For fedavg, it would be equal to the number of globalbatch/minibatch for the training, + the ones needed for the initialization/test (which can maybe be neglected ?)
With the model the amount of data exchanged is not really taking into account, I don't know if it's mandatory