argonne-lcf / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
7 stars 8 forks source link

Performance comparison of optimizers, such as Sophia, Lamb and AdamW, and identify appropriate hyper-parameter settings #38

Open venkat-1 opened 2 months ago

venkat-1 commented 2 months ago

The objective of this is to compare the performance of various optimizers, including for large-batch training, and identify appropriate hyper-parameter settings.