Open wei-seven opened 3 weeks ago
Thank you for your interest in this work. I'm sorry to tell that due to the long time, the checkpoints are no longer easy to find in our server. But I can provide you with most of the hyperparameter settings in the main experiment for BERT. For BERT on all dataset:
lora_r=8 lora_alpha=16
For BERT,learining rates are shown below. | COLA | MNLI | MRPC | QNLI | QQP | RTE | SST2 | STSB | |
---|---|---|---|---|---|---|---|---|---|
stage 1 lr | 3e-5 | 2e-5 | 3e-5 | 2e-5 | 3e-5 | 5e-5 | 2e-5 | 3e-5 | |
stage 2 lr | 9e-3 | 8e-3 | 5e-3 | 8e-3 | 5e-3 | 3e-3 | 8e-3 | 5e-3 | |
stage 2 save_steps | 10 | 10 | 10 | 10 | 20 | 10 | 20 | 10 |
Note that the hyperparameter "limit" in the second stage of training may need to be further adapted according to different data sets to obtain the best performance. "Epoch" should be large enough to make sure that performance can hardly improve with additional training. It also varies with different datasets.
Thanks for the awesome work!
Could you please provide the checkpoints? or could you provide the hyperparameters the best checkpoints used?