I find you apply SGD+momentum optimizer to the pre-training, but other optimizers are used for fine-tuning. I would like to know that have you tried other optimizers such as Adam, AdamW and LARS for pre-training? And will other choices lead to worse performance of pre-training?
Hello, thank you for this great work.
I find you apply SGD+momentum optimizer to the pre-training, but other optimizers are used for fine-tuning. I would like to know that have you tried other optimizers such as Adam, AdamW and LARS for pre-training? And will other choices lead to worse performance of pre-training?
Thank you very much.