Adapters on large-datasets in GLUE could not get the same results

google-research / adapter-bert

Apache License 2.0

483 stars 49 forks source link

Adapters on large-datasets in GLUE could not get the same results #9

Open dorost1234 opened 3 years ago

dorost1234 commented 3 years ago

Hi I am trying adapters on Bert-base. I am evaluating on GLUE. On smaller datasets like MRPC, RTE, COLA, I see good results, but on large datasets of GLUE like MNLI, QNLI, SST2 I am really struggling and this is getting very below BERT-base.

I have a deadline soon and need to compare fairly with your method, and very much appreciate your feedback on this. Any suggestions which can help the results on large-scale datasets?

thanks

neilhoulsby commented 3 years ago

What hyperparameters are you using? Did you follow the sweep in the paper?

"We sweep learning rates in {3 · 10−5, 3 · 10−4, 3 · 10−3}, and number of epochs in {3, 20}"