Open djshen opened 4 years ago
We don't know if this causes any significant differences but you can check through experiments as part of the project
On 2020-03-22 11:13, djshen wrote:
Currently, the learning rate decay happens after each iteration and the update rule is
lr = config.lr/(1 + args.lr_decay*step)
So, the learning rate of step 0 and 1 will be the same value config.lr. Is this the expected behavior? Or, the following is correct
lr = config.lr/(1 + args.lr_decay*(step+1))
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/simpleNN/issues/3", "url": "https://github.com/cjlin1/simpleNN/issues/3", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/cjlin1/simpleNN/issues/3 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHTBNEU2VUMHVHKYOFLRIV65DANCNFSM4LREQZ3A
If I use tf.keras.optimizers.schedules.InverseTimeDecay
in my code, I need to modify either simpleNN or TensorFlow to get exactly "the same results".
The following is a simple example.
import tensorflow as tf
lr_init = 0.1
lr_decay = 0.1
lr_keras = tf.keras.optimizers.schedules.InverseTimeDecay(
initial_learning_rate=lr_init,
decay_steps=1,
decay_rate=lr_decay)
lr_simplenn = lr_init
for step in range(11):
# keras get the learning rate "before" a batch
lr_keras_value = lr_keras(step)
print('Step {:2d}: train one batch with lr_keras {:6f} and lr_simplenn {:6f}'.format(
step, lr_keras_value, lr_simplenn))
# simpleNN update the learning "after" a batch
lr_simplenn = lr_init / (1 + lr_decay * step)
The output is
Step 0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step 1: train one batch with lr_keras 0.090909 and lr_simplenn 0.100000
Step 2: train one batch with lr_keras 0.083333 and lr_simplenn 0.090909
Step 3: train one batch with lr_keras 0.076923 and lr_simplenn 0.083333
Step 4: train one batch with lr_keras 0.071429 and lr_simplenn 0.076923
Step 5: train one batch with lr_keras 0.066667 and lr_simplenn 0.071429
Step 6: train one batch with lr_keras 0.062500 and lr_simplenn 0.066667
Step 7: train one batch with lr_keras 0.058824 and lr_simplenn 0.062500
Step 8: train one batch with lr_keras 0.055556 and lr_simplenn 0.058824
Step 9: train one batch with lr_keras 0.052632 and lr_simplenn 0.055556
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.052632
If I change step
to (step + 1)
in the last line, the output will be
Step 0: train one batch with lr_keras 0.100000 and lr_simplenn 0.100000
Step 1: train one batch with lr_keras 0.090909 and lr_simplenn 0.090909
Step 2: train one batch with lr_keras 0.083333 and lr_simplenn 0.083333
Step 3: train one batch with lr_keras 0.076923 and lr_simplenn 0.076923
Step 4: train one batch with lr_keras 0.071429 and lr_simplenn 0.071429
Step 5: train one batch with lr_keras 0.066667 and lr_simplenn 0.066667
Step 6: train one batch with lr_keras 0.062500 and lr_simplenn 0.062500
Step 7: train one batch with lr_keras 0.058824 and lr_simplenn 0.058824
Step 8: train one batch with lr_keras 0.055556 and lr_simplenn 0.055556
Step 9: train one batch with lr_keras 0.052632 and lr_simplenn 0.052632
Step 10: train one batch with lr_keras 0.050000 and lr_simplenn 0.050000
With this modification, I can get exactly the same loss values between simpleNN and its keras counterpart, where I replace almost everything in simpleNN with tf.keras.
Currently, the learning rate decay happens after each iteration and the update rule is
So, the learning rate of step 0 and 1 will be the same value
config.lr
. Is this the expected behavior? Or, the following is correct