Open yash-bhat opened 4 years ago
lr
denotes the learning rate, that is correct.
min_delta
is used by the curriculum and is used to determine whether a curriculum step is to be performed.
The weird output of the learning rate is due to the way Chainer calculates the learning rate of the Adam optimizer. The value actually does not steep down, it rather goes up until it reaches the provided learning rate.
Hello @Bartzi !
In my training, I have set lr = 1e-4 and min_delta = 1e-8. Am I correct in assuming these are learning rate and decay respectively?
Also, I print the values out at the start of the training and they seem fine but later on it quickly steeps down.
min_delta, decay rate: 1e-08 lr: 0.0001 /usr/local/lib/python3.6/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py:155: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size. format(optimizer.eps)) epoch iteration main/loss main/accuracy lr fast_validation/main/loss fast_validation/main/accuracy validation/main/loss validation/main/accuracy 1 100 2.49428 0 **3.08566e-05** 2.36821 0 3 200 1.94748 0 **4.25853e-05** 2.23569 0 total [#########.........................................] 19.93% this epoch [#################################################.] 98.60% 249 iter, 3 epoch / 20 epochs 0.48742 iters/sec. Estimated time to finish: 0:34:12.368322.
Can I know the relation and what might be affecting my LR drastically?