Open Swiminn opened 1 month ago
Hello,
Thank you for sharing your great research results. I enjoyed reading your paper and have been trying to run your code.
However, while reviewing the code, I encountered an issue regarding the dimension of the magnitude vector.
In the paper, it is mentioned that the size of the magnitude vector is R1×k. However, in the actual code, the magnitude is calculated as follows:
magnitude = (torch.linalg.norm(new_module.weight.detach(), dim=1)).unsqueeze(1).detach()
This results in
magnitude.shape
being (d×1), ifnew_module.weight.shape
is (d×k).Therefore, it seems that there is a discrepancy between the description of the magnitude shape in the paper and the actual shape in the code. Could you please explain this?
Thank you.
I've noticed a similar issue regarding the calculation of the norm for a matrix of size d*k (where d is the output dimension and k is the input dimension, if I'm not mistaken). In the code, the norm is calculated with dim=1
, which actually computes the norm of each row vector. However, in the paper, it is stated that the calculation should be "vector-wise norm of a matrix across each column". So according to the paper, we should get a tensor of length k (column number), not d as currently implemented.
Additionally, I observed that the Hugging Face peft
library also specifies dim=1
in their implementation:
weight_norm = torch.linalg.norm(weight, dim=1)
Hi, the normalization is always applied on the out_feature dimension. We did not formulate W in the first figure in the paper the same as nn.Linear.weight. The shape of W in this figure is of the shape (in_features, out_features) and the shape of nn.Linear.weight is of the shape (out_features,in_features) which might be causing your confusion.
Hi, the normalization is always applied on the out_feature dimension. We did not formulate W in the first figure in the paper the same as nn.Linear.weight. The shape of W in this figure is of the shape (in_features, out_features) and the shape of nn.Linear.weight is of the shape (out_features,in_features) which might be causing your confusion.
Thank you for your response, but I am still confused. In equation (1), it is formulated as
$W' = W_0 + \Delta W = W_0 + BA$ where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$
Then isn't the shape of $W$ is (out_features, in_features)?
In our paper, W is (in_feature, out_features), but for our implementation W is (out_features, in_features) following the config of nn.Linear
Hello,
Thank you for sharing your great research results. I enjoyed reading your paper and have been trying to run your code.
However, while reviewing the code, I encountered an issue regarding the dimension of the magnitude vector.
In the paper, it is mentioned that the size of the magnitude vector is $\mathbb{R}^{1 \times k}$. However, in the actual code, the magnitude is calculated as follows:
This results in
magnitude.shape
being $(d \times 1)$, ifnew_module.weight.shape
is $(d \times k)$.Therefore, it seems that there is a discrepancy between the description of the magnitude shape in the paper and the actual shape in the code. Could you please explain this?
Thank you.