Closed JoshVarty closed 5 years ago
get_step_index(cur_iter)
steps
that includes MAX_ITER
[0, 480000, 640000, 720000]
steps
for ind, step in enumerate(steps):
if cur_iter < step:
break
return ind - 1
lr_func_steps_with_decay(cur_iter)
def lr_func_steps_with_decay(cur_iter):
"""For cfg.SOLVER.LR_POLICY = 'steps_with_decay'
Change the learning rate specified iterations based on the formula
lr = base_lr * gamma ** lr_step_count.
Example:
cfg.SOLVER.MAX_ITER: 90
cfg.SOLVER.STEPS: [0, 60, 80]
cfg.SOLVER.BASE_LR: 0.02
cfg.SOLVER.GAMMA: 0.1
for cur_iter in [0, 59] use 0.02 = 0.02 * 0.1 ** 0
in [60, 79] use 0.002 = 0.02 * 0.1 ** 1
in [80, inf] use 0.0002 = 0.02 * 0.1 ** 2
"""
ind = get_step_index(cur_iter)
return cfg.SOLVER.BASE_LR * cfg.SOLVER.GAMMA ** ind
Porting their example to our values:
Example:
cfg.SOLVER.MAX_ITER: 720000
cfg.SOLVER.STEPS: [0, 480000, 640000]
cfg.SOLVER.BASE_LR: 0.00125
cfg.SOLVER.GAMMA: 0.1
for cur_iter in [0, 480000] use 0.00125 = 0.00125 * 0.1 ** 0
in [480000, 640000] use 0.000125 = 0.00125 * 0.1 ** 1
in [640000, inf] use 0.0000125 = 0.00125 * 0.1 ** 2
get_lr_at_iter(it)
After getting the scheduled learning rate, we make it smaller during the first 500 iterations in order to "warm up" the learning.
it > 500
) just return lr
alpha = it / cfg.SOLVER.WARM_UP_ITERS
0.0
= 0 / 500
warmup_factor = cfg.SOLVER.WARM_UP_FACTOR * (1 - alpha) + alpha
0.3333 = 0.333 * (1 - 0) + 0
lr
by the warm up factor. lr *= warmup_factor
0.0004166 = 0.00125 * 0.3333
We're using the correct LR already. We aren't correcting for momentum, but that's not what's causing the issues for me at the moment. They don't correct for momentum until there is a large shift in LR.
I'd like to match our learning rate to Detectrons.
In the config they define: