facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Other
925 stars 47 forks source link

Optimal Learning Rate #8

Open XinDongol opened 1 month ago

XinDongol commented 1 month ago

In paper, the optimal learning rate is 2e-3. In the pretrain.sh, the learning rate is set to 5e-4. Could you please advise the best learning rate to train MobileLLM models?