Closed NinedayWang closed 3 years ago
请问训练base和large模型时,学习率和warmup等分别是怎么设置的?
训练参数设置上我们参考了 LAMB: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
请问训练base和large模型时,学习率和warmup等分别是怎么设置的?