What learning rate should be used to fine-tune T5-large and T5-3B?

allenai / unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System

Apache License 2.0

426 stars 43 forks source link

For LR (and many other hyperparameters) we used the default choices which are accessible as part of the T5 models.

Within this file, you can see that it defines a schedule for learning rates:

# Parameters for learning_rate_schedule_noam:
# ==============================================================================
learning_rate_schedule_noam.linear_decay_fraction = 0.1
learning_rate_schedule_noam.multiplier = 1.0
learning_rate_schedule_noam.offset = 0
learning_rate_schedule_noam.warmup_steps = 10000

allenai / unifiedqa

What learning rate should be used to fine-tune T5-large and T5-3B? #40