Closed chen-yifu closed 2 years ago
For LR (and many other hyperparameters) we used the default choices which are accessible as part of the T5 models.
Within this file, you can see that it defines a schedule for learning rates:
# Parameters for learning_rate_schedule_noam:
# ==============================================================================
learning_rate_schedule_noam.linear_decay_fraction = 0.1
learning_rate_schedule_noam.multiplier = 1.0
learning_rate_schedule_noam.offset = 0
learning_rate_schedule_noam.warmup_steps = 10000
Hi, From a previous discussion #16 it was said that a learning rate of 0.001 was used. When I tried both 0.001 and 0.0001, it seemed that the latter gave a lower loss. I'm wondering if this means I should use a LR of 0.0001 instead? Thank you! Charles