Open Lanbai-eleven opened 7 months ago
Hello, thanks for your question. Yes, the convergence of AdaLoRA should be slightly slower than LoRA because AdaLoRA goes through three stages of budget scheduler: initial warmup, budget decreasing phrase, and final fine-tuning. If there are too few steps in the budget decreasing, the rank will be changed too quickly to adapt smoothly. As our ovservation, as long as we chose a propoer budget scheduler, the convergence of AdaLoRA should be similar with LoRA. For example, setting the total step of budget decaresing phrase as the half of total training steps.
During my practical usage, I have observed that the loss of AdaLora decreases slower compared to the original Lora. I would like to know if this is normal or if it is because I haven't configured AdaLora optimally.