QingruZhang / AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
MIT License
231 stars 23 forks source link

The question about convergence speed #19

Open Lanbai-eleven opened 7 months ago

Lanbai-eleven commented 7 months ago

During my practical usage, I have observed that the loss of AdaLora decreases slower compared to the original Lora. I would like to know if this is normal or if it is because I haven't configured AdaLora optimally.

QingruZhang commented 3 months ago

Hello, thanks for your question. Yes, the convergence of AdaLoRA should be slightly slower than LoRA because AdaLoRA goes through three stages of budget scheduler: initial warmup, budget decreasing phrase, and final fine-tuning. If there are too few steps in the budget decreasing, the rank will be changed too quickly to adapt smoothly. As our ovservation, as long as we chose a propoer budget scheduler, the convergence of AdaLoRA should be similar with LoRA. For example, setting the total step of budget decaresing phrase as the half of total training steps.