FedNLP: An Industry and Research Integrated Platform for Federated Learning in Natural Language Processing, Backed by FedML, Inc. The Previous Research Version is Accepted to NAACL 2022
I suggest finishing a function in a screen length (less than 100 lines). In some companies, this is hard rule. Even in NLP domain, like huggingface, they also follow this rule well. Their code is readable to me.
I don't see some special training tricks, so the training loop can be finished in a screen length.
when we want to repeat a functionality, it's better to extract it as a function. For example, 1) the early stopping related code repeats twice in the training loop, but the content is nearly the same; 2) defining the trainable parameters can be shrunk into a function (the begining part of train()).
Many variant Transformers' code are merged with different branches. I suggest only consider the models in FL. As a benchmark, two models will be enough.
Some issues:
this class also contains data loading in the training and evaluation loop, which should be handled outside the trainer class in FL, otherwise the performance may downgrade and some hidden issues may happen. Under FedML framework, the design pattern is to finish data loading of each client before starting the training loop.
I am trying to simplify the code.
Some suggestions:
I suggest finishing a function in a screen length (less than 100 lines). In some companies, this is hard rule. Even in NLP domain, like huggingface, they also follow this rule well. Their code is readable to me.
I don't see some special training tricks, so the training loop can be finished in a screen length.
when we want to repeat a functionality, it's better to extract it as a function. For example, 1) the early stopping related code repeats twice in the training loop, but the content is nearly the same; 2) defining the trainable parameters can be shrunk into a function (the begining part of
train()
).Many variant Transformers' code are merged with different branches. I suggest only consider the models in FL. As a benchmark, two models will be enough.
Some issues:
(May update once I found more)