Closed luchaoqi closed 5 months ago
Hello, thanks for your interest in our paper. We were experimenting with whether AdaLoRA requires dynamically adjusting the rank scale explicitly after some singular values are masked out, as it may result in a potential decrease of matrix magnitude. Therefore, adjusting the rank scaling (ranknum) may be needed after rank allocation. However, it turned out that explicitly adjusting ranknum
hurts the performance. The reason can be the discarded singular values may have small magnitude, resulting in a minimum influence on the matrix magnitude. There is no need to adjust rank scaling explicitly but we still leave the function here for future development :) Hope this can answer your questions. Thanks for your comments.
Hi, thanks for this awesome work!
I wanted to ask about the purpose of
self.ranknum
inadalora.py
. It seems that intransformer.py
you implemented ranknum withself.adapt_scaling
here. However inadalora.py
, the ranknum seems just a constant withrequires_grad=False
for scaling purpose:this is different in
transformer
. btw, why not directly useself.r
butself.ranknum+1e-5
in this case: