FedML-AI / FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
223 stars 45 forks source link

Accuracy and loss didn't improve for FedAvg on 20news #19

Open haofuml opened 2 years ago

haofuml commented 2 years ago

I am trying to reproduce the FedAvg results on 20news data. However, the FedAvg algorithm on 20news task seems not working. Comparing with the centralized run, the eval accuracy and loss of FedAvg did not make any improvement after many rounds (eval acc 0.063, eval loss 2.969). The results and experiment setting can be checked here: https://wandb.ai/haofuml/fednlp_bl?workspace=user-haofuml

chaoyanghe commented 2 years ago

@haofuml I see. I will check this issue this weekend.

shubham-malaviya commented 2 years ago

I am trying to reproduce the FedAvg results on 20news data. However, the FedAvg algorithm on 20news task seems not working. Comparing with the centralized run, the eval accuracy and loss of FedAvg did not make any improvement after many rounds (eval acc 0.063, eval loss 2.969). The results and experiment setting can be checked here: https://wandb.ai/haofuml/fednlp_bl?workspace=user-haofuml

Hi, any progress on this? Thanks!

chaoyanghe commented 2 years ago

@shubham-malaviya set the optimizer to "fedOPT", it should work

IreneTenison commented 2 years ago

I'm facing the same issue. Is there any reason for the lack of improvement with fedAVG? Do you have a workaround? @chaoyanghe

zjc664656505 commented 2 years ago

Similar, FedAvg has no improvement on the acc and loss. The potential reason could be the parameter update from client to server stuck somewhere.

zjc664656505 commented 2 years ago

3335 2022-03-13,22:21:21.453 - {FedAVGAggregator.py (45)} - add_local_trained_result(): add_model. index = 1 3335 2022-03-13,22:21:21.453 - {FedAvgServerManager.py (52)} - handle_message_receive_model_from_client(): b_all_received = False 3335 2022-03-13,22:21:23.862 - {FedAVGAggregator.py (45)} - add_local_trained_result(): add_model. index = 2 3335 2022-03-13,22:21:23.870 - {FedAvgServerManager.py (52)} - handle_message_receive_model_from_client(): b_all_received = False 3335 2022-03-13,22:21:28.058 - {FedAVGAggregator.py (45)} - add_local_trained_result(): add_model. index = 3 3335 2022-03-13,22:21:28.064 - {FedAvgServerManager.py (52)} - handle_message_receive_model_from_client(): b_all_received = False 3335 2022-03-13,22:21:28.365 - {FedAVGAggregator.py (45)} - add_local_trained_result(): add_model. index = 4 3335 2022-03-13,22:21:28.372 - {FedAvgServerManager.py (52)} - handle_message_receive_model_from_client(): b_all_received = True 3335 2022-03-13,22:21:39.657 - {FedAVGAggregator.py (70)} - aggregate(): len of self.model_dict[idx] = 5 3335 2022-03-13,22:21:39.960 - {FedAVGAggregator.py (87)} - aggregate(): aggregate time cost: 11 3335 2022-03-13,22:21:39.961 - {tc_transformer_trainer.py (137)} - eval_model(): len(test_dl) = 942, n_batches = 942 indexes of clients: [37 26 78 91 49]3335 2022-03-13,22:21:49.148 - {tc_transformer_trainer.py (180)} - eval_model(): best_accuracy = 0.009692 3335 2022-03-13,22:21:49.148 - {tc_transformer_trainer.py (188)} - eval_model(): {'mcc': -0.012776380246762163, 'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0, 'acc': 0.009426447158789167, 'eval_loss': 3.0077073222259556} 3335 2022-03-13,22:21:49.149 - {FedAVGAggregator.py (97)} - client_sampling(): client_indexes = [37 26 78 91 49]

zjc664656505 commented 2 years ago

The log is like this. It seems like the model on the server has never received the updated parameters from the clients. I think this is the reason causing the results never get updated.