cs231n / cs231n.github.io

Public facing notes page
MIT License
10.13k stars 4.06k forks source link

Assignment 2: bad initialization in adam method in optim.py #179

Open v-iashin opened 6 years ago

v-iashin commented 6 years ago

Currently, it initializes as follows (look for t parameter):

if config is None: config = {}
config.setdefault('learning_rate', 1e-3)
config.setdefault('beta1', 0.9)
config.setdefault('beta2', 0.999)
config.setdefault('epsilon', 1e-8)
config.setdefault('m', np.zeros_like(x))
config.setdefault('v', np.zeros_like(x))
config.setdefault('t', 1)
next_x = None
  1. The algorithm should be initialized at t=0 (https://arxiv.org/abs/1412.6980);
  2. So, the initialization mentioned in the assignment (where t=1) requires to update t after calculating unbiased moments because the first pass should be with t=1 rather than t=2, isn't it?
  3. However, inserting config['t'] += 1 after calculating unbiased moments leads to the incorrect next_w and next_w error: 0.00152184517579 during the checks.
  4. Once I put config['t'] += 1 before calculating unbiased moments, meaning that I start with the t=2, the next_w error is okay.
  5. So, my suggestion is to change the line
    config.setdefault('t', 1)

    to

    config.setdefault('t', 0)

    (Spring, 2017)