Open itongggg opened 2 weeks ago
It seems they want the wrapped model to be exactly the same as the original one if keep_original_weights, otherwise lora_A.weight is initialized as kaiming in ReLoRaLinear, but even so, B times A is still zero. So it seems to me not a typo but a redundancy?
It seems they want the wrapped model to be exactly the same as the original one if keep_original_weights, otherwise lora_A.weight is initialized as kaiming in ReLoRaLinear, but even so, B times A is still zero. So it seems to me not a typo but a redundancy?
but if A and B both are initialized with zero weights, the training process are stuck? since the gradient of A euqals to B^T\frac{\partial L}{\partial W} and the gradient of B euqals to \frac{\partial L}{\partial W}A^T , in this case your gradients for A and B would be zero all time.
Oh, even though both A and B are zero-initialized, as you mentioned, the updates will be slow at first due to the small gradients. However, the gradients are not zero because of the presence of the original W, so they can still be gradually updated. I think the authors might intend to do this?
Oh, even though both A and B are zero-initialized, as you mentioned, the updates will be slow at first due to the small gradients. However, the gradients are not zero because of the presence of the original W, so they can still be gradually updated. I think the authors might intend to do this?
as i mentioned before the gradient of A = B^TG and B = GA^T. and G is the gradient of W so if you both initialize the A and B zero, it would never update the parameters of A and B
In your relora.py I found that for every relora layer, the B matrix is initialized as a zero matrix, it is same as standard setting, however, i also found
when you wrap a model as a relora model, the matrix A is also initialized as a zero matrix, is it a typo ?