Details for hyperparameter for pre-training.

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.25k stars 171 forks source link

Details for hyperparameter for pre-training. #140

Closed JonghwanMun closed 1 year ago

JonghwanMun commented 1 year ago

In Table 4 of the appendix, detailed hyper-parameters for pre-training is given.

However, in Section 4 of the paper, for pre-training, it is mentioned that 0.0001 (1e-4) learning rate, 0.1 weight decay and 2k warm-up steps are used.

Which values are correct?

MAGAer13 commented 1 year ago

The section 4. We will correct it.