QingruZhang / AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
MIT License
231 stars 23 forks source link

AssertionError #20

Open luoxindi opened 6 months ago

luoxindi commented 6 months ago

Thanks for your contributation! I reproduced your results on MNLI with the hyparameters provided in the ReadMe, but encounted the error "assert self.total_step>self.initial_warmup+self.final_warmup". How can I fix it ?

QingruZhang commented 3 months ago

Hello, you should tune the scheduler hyperparameters to have initial_warmup+final_warmup less than total training steps.

Car-pe commented 2 months ago
image

When I try to reproduce the results on squvd_v1.1, I have set the hyper-parameter the same as that claimed on the paper, but still has this error. The "total_step" is 27600 while "initial_warmup" is 5000 and "final_warmup" is 25000. Is there anything wrong? Following is my script:

--model_name_or_path microsoft/deberta-v3-base \
--dataset_name squad \
--apply_lora --apply_adalora \
--lora_type svd --target_rank 8 --lora_r 12 \
--reg_orth_coef 0.1 \
--init_warmup 5000 --final_warmup 25000 --mask_interval 100 \
--beta1 0.85 --beta2 0.85 \
--lora_module query,key,value,intermediate,layer.output,attention.output \
--lora_alpha 16 \
--do_train --do_eval --version_2_with_negative \
--max_seq_length 384 --doc_stride 128 \
--per_device_train_batch_size 16 \
--learning_rate 1e-3 \
--num_train_epochs 10 \
--warmup_steps 1000 --per_device_eval_batch_size 128 \
--evaluation_strategy steps --eval_steps 3000 \
--save_strategy steps --save_steps 30000 \
--logging_steps 300 \
--tb_writter_loginterval 300 \
--report_to tensorboard \
--seed $seed \
--root_output_dir ./output/debertav3-base/squadv1.1_${seed}_ \
--overwrite_output_dir \