Langboat / Mengzi

Mengzi Pretrained Models
Apache License 2.0
534 stars 63 forks source link

请问预训练的schedule是怎么设置的 #19

Closed NinedayWang closed 3 years ago

NinedayWang commented 3 years ago

请问训练base和large模型时,学习率和warmup等分别是怎么设置的?

Ag2S1 commented 3 years ago

训练参数设置上我们参考了 LAMB: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes