Closed jorie-peng closed 4 years ago
Hi @jorie-peng. It should depend on your decision. The greater num_iter
is given, the finer scales of lr-loss curve you can get (but it also takes longer to run range_test
).
In your case, when num_iter
is increased from 27 to 270, the scale of lr-loss
curve will be 10x finer. And that's why the range of lr
candidates is shortened.
As to pick a proper lr
, there is no a single rule can be used to determine which lr
is correct or the best by just running range_test
. We usually have to apply some hyper-parameter searching algorithms to optimize it, e.g. bayesian optimization, hyperband...
However, we can still find out the possible lr
that could make model be trained better. Since we know that loss will decrease obviously in some range, we can speculate that lr
in that range would be a proper one to train a model easier. And here are a few approaches which are the most often used:
lr_min
) of the lr-loss curve, then lr_min/10.
is the one to useYou can check out this comment if you are interesting in this topic.
Thanks for your quick answer! Finding a proper lr is really a hard work. I will try your suggestion. Thanks again
Hello Thanks for contributing to this project. I think that how to decide the min & max values of LR from the loss-lr curve should be explained in README.
@rose-jinyang, in the README it is briefly explained how the original author of the paper advises doing the LR selection. Quoting from the README:
the author advises the point at which the loss starts descending and the point at which the loss stops descending or becomes ragged for
start_lr
andend_lr
respectively. In the plot below,start_lr = 0.0002
andend_lr=0.2
.
Here start_lr
and end_lr
correspond to the min and max values of the learning rate for a cyclical scheduler.
Thanks for your quick reply.
hi davidtvs, when use lr find, I find the loss curve is a little different when use different num_iter. for example, image num is 6700, batch size is 252, when use num_iter=27, the loss decreases obviouslyfrom 1e-4 to 1e-3, but when num_iter =270, it decreases obviously from 1e-4 to 5e-4,when num_iter=540, it decreases obviously from 1e-4 to 8e-4, so , I am not sure which loss is correct? @davidtvs Thanks! it is a really good tool