google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

Learning rate mentioned in paper vs run_summarization.py #22

Open s4sarath opened 3 years ago

s4sarath commented 3 years ago

Hi ,

The learning rate mentioned in paper for summarization is around 3e-5 . But in the run_summarization.py it is mentioned as 0.32 ( default ) in the flags. In roberta_base.sh script, there is no changing happen for the learning rate.

Can anyone please update on this, as learning rate is very crucial for models like these.

Thanks