Open dmikushin opened 1 month ago
Unfortunately, there is no built-in support.
As long as I understood you correctly, you want to convert LoTRLinear
layers back to Linear
in order to merge adapter weights to original weight matrix $W$. So you probably are looking for LoTRLinear.to_linear
method but it is not implemented at the moment. However, it is quite easy to do it manually. For each LoTRLinear
layer, one should contract factors for $s$-th slice as follows.
def to_linear(self: LoTRLinear) -> Linear:
self.linear.weights += torch.einsum('ij,jk,kl->il', self.lotr.rhs, self.lotr.mid, self.lotr.lhs)
return self.linear
In this way, you can restore original model architecture and save checkpoint which can be easily restored later.
I'm not sure. To simplify this discussion, let's speak in terms of linear algebra. Suppose that I have a linear equation with a dense matrix Ax = b. Assume it can only be solved with an approximate method, such as Krylov method, e.g. BiCGStab. BiCGStab requires to multiply the problem matrix by a vector many times. But the method is generic: it does not care how and where the matrix is multiplied by a vector, it only requests from me to give it a result of multiplication. Multiply operation is a black box from the solver's point of view. So I would compress the matrix in some way that it's not too dense and easy to multiply, and I multiply it directly whenever solver requests. This is what I practically expect from LoTR.
In my understanding, the whole purpose of LoTR is to compress the weights and never go back to the original weights again. So, we should not ever do to_linear
, because it means we re-create the whole original dense matrix out of compressed matrix, in order to multiply. Instead, we need to teach torch to multiply by a LoTR representation of tensor on-the-fly. I'm eager to add the missing LoTR program mechanics to allow that :pray:
If I am understand you correctly, you think that the LoTR is used to compress the whole weight matrix. However, I think there is a confusion. The LoTR is used to represent correction $\delta W$ in low-rank form directly. So it is not used to represent to whole weight matrix $W$, only the correction part $\delta W$ to the original weight matrix $W$. The whole weight matrix can be high rank while corrections are low-rank. For your use case, you might consider using TT decomposition or other tensor decompositions for the whole weight matrix, e.g. (Novikov, 2015) or (Chekalina, 2023).
Dear all,
It would be great to see an end-to-end practical example of LoTR. By "practical" I mean that one takes, for example some existing LLM weights file, compresses it into a smaller weights file with LoTR, and then uses the new weights file for inference. For the first part I imagine something like this:
Does this make sense?