how to define num_iter?

jorie-peng commented 4 years ago

hi davidtvs, when use lr find, I find the loss curve is a little different when use different num_iter. for example, image num is 6700, batch size is 252, when use num_iter=27, the loss decreases obviouslyfrom 1e-4 to 1e-3, but when num_iter =270, it decreases obviously from 1e-4 to 5e-4,when num_iter=540, it decreases obviously from 1e-4 to 8e-4, so , I am not sure which loss is correct? @davidtvs Thanks! it is a really good tool

NaleRaphael commented 4 years ago

Hi @jorie-peng. It should depend on your decision. The greater num_iter is given, the finer scales of lr-loss curve you can get (but it also takes longer to run range_test). In your case, when num_iter is increased from 27 to 270, the scale of lr-loss curve will be 10x finer. And that's why the range of lr candidates is shortened.

As to pick a proper lr, there is no a single rule can be used to determine which lr is correct or the best by just running range_test. We usually have to apply some hyper-parameter searching algorithms to optimize it, e.g. bayesian optimization, hyperband...

However, we can still find out the possible lr that could make model be trained better. Since we know that loss will decrease obviously in some range, we can speculate that lr in that range would be a proper one to train a model easier. And here are a few approaches which are the most often used:

the steepest point in lr-loss curve
find the global minimum (lr_min) of the lr-loss curve, then lr_min/10. is the one to use

You can check out this comment if you are interesting in this topic.

jorie-peng commented 4 years ago

Thanks for your quick answer! Finding a proper lr is really a hard work. I will try your suggestion. Thanks again

rose-jinyang commented 3 years ago

Hello Thanks for contributing to this project. I think that how to decide the min & max values of LR from the loss-lr curve should be explained in README.

davidtvs commented 3 years ago

@rose-jinyang, in the README it is briefly explained how the original author of the paper advises doing the LR selection. Quoting from the README:

the author advises the point at which the loss starts descending and the point at which the loss stops descending or becomes ragged for start_lr and end_lr respectively. In the plot below, start_lr = 0.0002 and end_lr=0.2.

Here start_lr and end_lr correspond to the min and max values of the learning rate for a cyclical scheduler.

rose-jinyang commented 3 years ago

Thanks for your quick reply.

davidtvs / pytorch-lr-finder

how to define num_iter? #50