The client side learning rate warm up and scheduler may be an issue for FL.

https://github.com/FedML-AI/FedNLP/blob/bd6dbb98e334637d69ad61e65f8d5ae75bf8d1cb/model/fed_transformers/classification/classification_model.py#L487

Combining client side learning rate warmp and scheduler with a distributed optimizer (FedAvg, FedOpt, etc) looks unreasonable in current optimization theory. We need to confirm its effectiveness by experiments. My suggestion is deleting it if the experiments show that it does not improve the accuracy too much, because ML researcher may think this algorithmic combination is wrong. We should avoid such confusion from their mind.

FedML-AI / FedNLP

The client side learning rate warm up and scheduler may be an issue for FL. #5