allenai / unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System
https://arxiv.org/abs/2005.00700
Apache License 2.0
426 stars 43 forks source link

What learning rate should be used to fine-tune T5-large and T5-3B? #40

Closed chen-yifu closed 2 years ago

chen-yifu commented 2 years ago

Hi, From a previous discussion #16 it was said that a learning rate of 0.001 was used. When I tried both 0.001 and 0.0001, it seemed that the latter gave a lower loss. I'm wondering if this means I should use a LR of 0.0001 instead? Thank you! Charles

danyaljj commented 2 years ago

For LR (and many other hyperparameters) we used the default choices which are accessible as part of the T5 models.

Within this file, you can see that it defines a schedule for learning rates:

# Parameters for learning_rate_schedule_noam:
# ==============================================================================
learning_rate_schedule_noam.linear_decay_fraction = 0.1
learning_rate_schedule_noam.multiplier = 1.0
learning_rate_schedule_noam.offset = 0
learning_rate_schedule_noam.warmup_steps = 10000