Closed JonghwanMun closed 1 year ago
In Table 4 of the appendix, detailed hyper-parameters for pre-training is given.
However, in Section 4 of the paper, for pre-training, it is mentioned that 0.0001 (1e-4) learning rate, 0.1 weight decay and 2k warm-up steps are used.
Which values are correct?
The section 4. We will correct it.
In Table 4 of the appendix, detailed hyper-parameters for pre-training is given.
However, in Section 4 of the paper, for pre-training, it is mentioned that 0.0001 (1e-4) learning rate, 0.1 weight decay and 2k warm-up steps are used.
Which values are correct?