Learning Rate - Githubissues

JoshVarty / pytorch-retinanet

Reproducing the Detectron implementation of RetinaNet

MIT License

0 stars 1 forks source link

Learning Rate #5

Closed JoshVarty closed 5 years ago

JoshVarty commented 5 years ago

I'd like to match our learning rate to Detectrons.

In the config they define:

SOLVER:
  WEIGHT_DECAY: 0.0001
  LR_POLICY: steps_with_decay
  BASE_LR: 0.00125
  GAMMA: 0.1
  MAX_ITER: 720000
  STEPS: [0, 480000, 640000]

main()
    train_model()
      get_lr_at_iter(it)
            lr_func_steps_with_decay(cur_iter)
                  get_step_index(cur_iter)
      UpdateWorkspaceLr(it)

JoshVarty commented 5 years ago

`get_step_index(cur_iter)`

Source

Create steps that includes MAX_ITER
- [0, 480000, 640000, 720000]

Loop over steps

for ind, step in enumerate(steps): 
    if cur_iter < step:
        break
return ind - 1

JoshVarty commented 5 years ago

`lr_func_steps_with_decay(cur_iter)`

Source

def lr_func_steps_with_decay(cur_iter):
    """For cfg.SOLVER.LR_POLICY = 'steps_with_decay'

    Change the learning rate specified iterations based on the formula
    lr = base_lr * gamma ** lr_step_count.

    Example:
    cfg.SOLVER.MAX_ITER: 90
    cfg.SOLVER.STEPS:    [0,    60,    80]
    cfg.SOLVER.BASE_LR:  0.02
    cfg.SOLVER.GAMMA:    0.1
    for cur_iter in [0, 59]   use 0.02 = 0.02 * 0.1 ** 0
                 in [60, 79]  use 0.002 = 0.02 * 0.1 ** 1
                 in [80, inf] use 0.0002 = 0.02 * 0.1 ** 2
    """
    ind = get_step_index(cur_iter)
    return cfg.SOLVER.BASE_LR * cfg.SOLVER.GAMMA ** ind

Porting their example to our values:

Example:
cfg.SOLVER.MAX_ITER: 720000
cfg.SOLVER.STEPS:    [0, 480000, 640000]
cfg.SOLVER.BASE_LR:  0.00125
cfg.SOLVER.GAMMA:    0.1
for cur_iter in [0, 480000]       use 0.00125 = 0.00125 * 0.1 ** 0
             in [480000, 640000]  use 0.000125 = 0.00125 * 0.1 ** 1
             in [640000, inf]     use 0.0000125 = 0.00125 * 0.1 ** 2

JoshVarty commented 5 years ago

`get_lr_at_iter(it)`

After getting the scheduled learning rate, we make it smaller during the first 500 iterations in order to "warm up" the learning.

Source

Get the learning rate according to our schedule.
If we're done wamrup (eg it > 500) just return lr
Otherwise, we're going to use a linear warmup method.
alpha = it / cfg.SOLVER.WARM_UP_ITERS
- 0.0 = 0 / 500
warmup_factor = cfg.SOLVER.WARM_UP_FACTOR * (1 - alpha) + alpha
- 0.3333 = 0.333 * (1 - 0) + 0
Adjust lr by the warm up factor. lr *= warmup_factor
- 0.0004166 = 0.00125 * 0.3333

JoshVarty commented 5 years ago

We're using the correct LR already. We aren't correcting for momentum, but that's not what's causing the issues for me at the moment. They don't correct for momentum until there is a large shift in LR.