Hi,
I have a question about the learning rate schedule. In your _stepdecay function define the gamma value ([args.warmup_ratio, 1.0, 0.1, 0.01, 1.0, 0.1, 0.01]).
Why do you increase the learning rate in stage 4? Because I follow your learning rate schedule trained the model, the loss is increase when the learning rate is bigger. Is it normal? Is it a trick or I'm doing something wrong?
Hi, I have a question about the learning rate schedule. In your _stepdecay function define the gamma value ([args.warmup_ratio, 1.0, 0.1, 0.01, 1.0, 0.1, 0.01]). Why do you increase the learning rate in stage 4? Because I follow your learning rate schedule trained the model, the loss is increase when the learning rate is bigger. Is it normal? Is it a trick or I'm doing something wrong?
Thanks