Pretraining v.s. Fine-tuning

I got the training result using the default hyper-parameters in our current GitHub. Since we used pretrained weights, the accuracy is already as high as 79% in the first round, and several rounds already got the limit (around 83%). Thus too many round of federated training is unnecessary (our current default is 500, I terminated the training at 100). Check results here:

https://wandb.ai/automl/fednlp/runs/3oc7a3jc/logs?workspace=user-chaoyanghe-com

Given that NLP is dominated by transformer-based pretraining, in our FedNLP, we only need to do federated fine-tuning, right? Actually I have a research idea about federated pretraining for cross-silo setting, let's discuss more after ICML

FedML-AI / FedNLP

Pretraining v.s. Fine-tuning #7