google-research / federated

A collection of Google research projects related to Federated Learning and Federated Analytics.
Apache License 2.0
693 stars 196 forks source link

federated_trainer.py slowness #34

Closed amitport closed 3 years ago

amitport commented 3 years ago

I'm currently running optimization/main/federated_trainer.py with emnist, nightly tf and tff, and cuda 2080ti and each round takes about a minute.

I'm not sure if this qualifies as a performance issue, but if I'm not mistaken the performance was much better with the GPU (about 10 sec per round).

exact execution parameters (baseline FedAvg): --task=emnist_cr --clients_per_round=10 --client_datasets_random_seed=1 --client_epochs_per_round=1 --total_rounds=1500 --client_batch_size=20 --emnist_cr_model=cnn --client_optimizer=sgd --client_learning_rate=0.1 --server_optimizer=sgd --server_learning_rate=1 --server_sgd_momentum=0.0

zcharles8 commented 3 years ago

Hi @amitport. Just to clarify, does it seem like there's been a performance regression from 10s per round to 1 minute per round? Likely this would pertain primarily to TFF, as there have been no real updates in the code in question in the last week.

amitport commented 3 years ago

Hi, I actually haven't run this federated_trainer.py since around 0.18 release. in any case, it definitely was faster, even when running locally on geforce GTX 960M

zcharles8 commented 3 years ago

@amitport Just to check: You are saying that the training performance using the 0.18 release of TFF was faster than the current training performance of TFF using tensorflow-federated-nightly? If so, this might be better as a bug on TFF. I can't immediatley find anything in optimization/main/federated_trainer.py that has changed recently that would give performance degradation.

amitport commented 3 years ago

Thanks @zcharles8 ! In any case, closing this for now since had a high variance in the performance of different experiments, but don't yet have enough data to publish an actionable issue