Should I run a total of 2x or 3x with the same lr(1e-4), pick the highest one, resume it, and change the corresponding checkpoint about lr to drop lr ?
Can you provide more specific details about it?
Additionally, I noticed that in the 3x training with a 4-scale, the learning rate is dropped twice. Could you please provide more details about the training settings, such as the learning rate schedule or other relevant information?
Thank you so much for your help.
Should I run a total of 2x or 3x with the same lr(1e-4), pick the highest one, resume it, and change the corresponding checkpoint about lr to drop lr ? Can you provide more specific details about it? Additionally, I noticed that in the 3x training with a 4-scale, the learning rate is dropped twice. Could you please provide more details about the training settings, such as the learning rate schedule or other relevant information? Thank you so much for your help.