Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
573 stars 147 forks source link

Relation between min_delta and LR #97

Open yash-bhat opened 4 years ago

yash-bhat commented 4 years ago

Hello @Bartzi !

In my training, I have set lr = 1e-4 and min_delta = 1e-8. Am I correct in assuming these are learning rate and decay respectively?

Also, I print the values out at the start of the training and they seem fine but later on it quickly steeps down.

min_delta, decay rate: 1e-08 lr: 0.0001 /usr/local/lib/python3.6/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py:155: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size. format(optimizer.eps)) epoch iteration main/loss main/accuracy lr fast_validation/main/loss fast_validation/main/accuracy validation/main/loss validation/main/accuracy 1 100 2.49428 0 **3.08566e-05** 2.36821 0 3 200 1.94748 0 **4.25853e-05** 2.23569 0 total [#########.........................................] 19.93% this epoch [#################################################.] 98.60% 249 iter, 3 epoch / 20 epochs 0.48742 iters/sec. Estimated time to finish: 0:34:12.368322.

Can I know the relation and what might be affecting my LR drastically?

Bartzi commented 4 years ago

lr denotes the learning rate, that is correct. min_delta is used by the curriculum and is used to determine whether a curriculum step is to be performed.

The weird output of the learning rate is due to the way Chainer calculates the learning rate of the Adam optimizer. The value actually does not steep down, it rather goes up until it reaches the provided learning rate.