QingruZhang / AdaLoRA

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
MIT License
231 stars 23 forks source link

Questions about ranknum #12

Closed luchaoqi closed 5 months ago

luchaoqi commented 9 months ago

Hi, thanks for this awesome work!

I wanted to ask about the purpose of self.ranknum in adalora.py. It seems that in transformer.py you implemented ranknum with self.adapt_scaling here. However in adalora.py, the ranknum seems just a constant with requires_grad=False for scaling purpose:

self.scaling / (self.ranknum+1e-5)

this is different in transformer. btw, why not directly use self.r but self.ranknum+1e-5 in this case:

self.scaling / self.r
QingruZhang commented 9 months ago

Hello, thanks for your interest in our paper. We were experimenting with whether AdaLoRA requires dynamically adjusting the rank scale explicitly after some singular values are masked out, as it may result in a potential decrease of matrix magnitude. Therefore, adjusting the rank scaling (ranknum) may be needed after rank allocation. However, it turned out that explicitly adjusting ranknum hurts the performance. The reason can be the discarded singular values may have small magnitude, resulting in a minimum influence on the matrix magnitude. There is no need to adjust rank scaling explicitly but we still leave the function here for future development :) Hope this can answer your questions. Thanks for your comments.