FedML-AI / FedNLP

FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
223 stars 45 forks source link

Hanging after last round of training #23

Closed bangawayoo closed 2 years ago

bangawayoo commented 2 years ago

Hi, thanks for the great work.

When running sh run_text_classification.sh FedOPT "niid_label_clients=100_alpha=100.0" 1e-3 0.1 1 4, the process does not terminate automatically after the last round of training regardless of the number of communication rounds.

The log stops after displaying the last eval metric

_18521 2021-12-29,21:14:53.265 - {tc_transformer_trainer.py (180)} - eval_model(): best_accuracy = 0.000000
18521 2021-12-29,21:14:53.266 - {tc_transformer_trainer.py (188)} - eval_model(): {'mcc': 0.0, 'tp': 0, 'tn': 0, 'fp': 0, 'fn': 0, 'acc': 0.0, 'evalloss': 3.01809245740279}

Commenting out post_complete_message_to_sweep_process(self.args) on ClientManger and ServerManger does abort the program, so it seems something with FIFO is the problem. Will commenting out the function cause any problem?

Possibly related to an issue from FedML.

MrigankRaman commented 2 years ago

Hi! We at FedML have launched a new platform for FedNLP where this issue should not be there. Can you please check whether you face the same issue there? Here is the new FedNLP platform: https://github.com/FedML-AI/FedML/tree/master/python/app/fednlp