TsinghuaC3I / SoRA

The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.
62 stars 8 forks source link

LoRA beseline parameter #4

Open jypppppp opened 8 months ago

jypppppp commented 8 months ago

Hi,

Thanks for your good work!

Can you clarify what is learning rate,bsz and epochs for baseline LoRA experiments among different datasets

Kind regards,

Jason

telxt commented 8 months ago

Thank you for your interest in our work! The hyper-parameters of LoRA is listed in the following:

Dataset | lr | epoch -- | -- | -- CoLA | 8e-5 | 20 SST-2 | 1e-4 | 10 MRPC | 1e-4 | 20 QQP | 3e-4 | 10 STS-B | 1e-4 | 20 MNLI | 3e-4 | 10 QNLI | 3e-4 | 10 RTE | 1.2e-3 | 50

And the seed list is {0, 21, 42, 81, 100}, the batch_size is 8.

I hope my response helps you.

ouxinwei111 commented 7 months ago

Hi, I was wondering if you use the same learning rate for all the rank settings. Looking forward to your help :)