coastalcph / lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
180 stars 35 forks source link

Hyper-parameters of DeBERTa for EUR-LEX #20

Closed cooelf closed 2 years ago

cooelf commented 2 years ago

Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?

Looking forward to your reply. Thanks!

iliaschalkidis commented 2 years ago

Hi @cooelf, we used the microsoft/deberta-base from HuggingFace, which I guess is the base configuration of V1? I see that both V2 and V3 use the deberta-v2 prototype (model type) from HuggingFace.

We used a learning rate of 3e-5 across all base-sized models. No warm-up or anything else special. We also use early stopping up to 20 epochs in total with a patience for 3 epochs.

We were only able to benchmark the large version of RoBERTa, you may find the result on the Appendix of our paper. In this case, we used a learning rate of 1e-5, warm-up ratio of 0.06, and weight decay of 0.1, since we found that larger models are very unstable, and "degenerate" with larger learning rates and no warm-up.

cooelf commented 2 years ago

Hi @iliaschalkidis, Thanks a lot for the quick reply. Yeah, I also found large models work unstably on the dataset. Maybe it is because I was using microsoft/deberta-v3-large. I will check the appendix and try the recommended settings :)

Thanks!