Hi
I am trying adapters on Bert-base. I am evaluating on GLUE. On smaller datasets like MRPC, RTE, COLA, I see good results, but on large datasets of GLUE like MNLI, QNLI, SST2 I am really struggling and this is getting very below BERT-base.
I have a deadline soon and need to compare fairly with your method, and very much appreciate your feedback on this. Any suggestions which can help the results on large-scale datasets?
Hi I am trying adapters on Bert-base. I am evaluating on GLUE. On smaller datasets like MRPC, RTE, COLA, I see good results, but on large datasets of GLUE like MNLI, QNLI, SST2 I am really struggling and this is getting very below BERT-base.
I have a deadline soon and need to compare fairly with your method, and very much appreciate your feedback on this. Any suggestions which can help the results on large-scale datasets?
thanks