Closed wj210 closed 7 months ago
Sorry, we have not implemented the function that merges LoRA into the original weights. However, it should be easy to implement. The following is the pseudo-code:
for param_name, param in ckpts:
if param_name.endswith(".weight") and param_name[:-7]+".lora_a.weight" in ckpts:
lora_a = ckpts[param_name[:-7]+".lora_a.weight"]
lora_b = ckpts[param_name[:-7]+".lora_b.weight"]
param = param + lora_b @ lora_a
ckpts[param_name] = param
hi, thanks for your response.
Why do we use [:-7]? i suppose in the attention blocks and FF layers, each of the layer component (w0,wq,wv,wk in attention) and normal layer for FF, there is a lora_a,b component.
All i have to do, is just implement param = param + lora_b @ lora_a
for each of them?
i created this function which loads the saved lora weights and merge them before reinitalizing the lora weights again for the next fine-tuning task. Would this suffice? `
AB = torch.matmul(B,A) # Make sure dimensions are compatible
self.weight.data.add_(AB)
# reset lora_a and lora_b
self.lora_a = RowParallelLinear(self.in_features, self.lora_rank, bias=False, input_is_parallel=True)
# workaround because trunc_normal_ does not currently support bfloat16
_ = init.trunc_normal_(self.lora_a.weight.data.to(torch.float32), std=.02)
self.lora_a.weight.data.copy_(_)
self.lora_b = nn.Linear(self.lora_rank, self.out_features, bias=False)
nn.init.zeros_(self.lora_b.weight)`
Just to add on another question. in https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/1265a34d2e98ac58c7b4fbd65fec730c81f6cfcd/accessory/tools/generate_packed_data.py#L26C13-L26C13 why do we take away the bos token during packing? would we not want the model to know that the next context is separated?
The length of `.weight' is 7, "llma.layers.0.attn.wq.weight"[:-7]+".lora_a.weight" == "llma.layers.0.attn.wq.lora_a.weight"
Yes
In the script, we seperate contexts with [EOS] tokens and [BOS] tokens are discarded. You may also preserve the [BOS] tokens if you like, as it should not make a large difference.
from #41, i can see that
util.tensor_parallel.load_tensor_parallel_model_list
is used to load the latest weights for the same module name. I assume that's for multiple copies of adapter weights or so.What if i want to merge the lora_a and lora_b into the pretrained models, how could i do so? In
LoraColumnParallelLinear
, Y = WX + BAX. Is there a function to save the model after training. where the frozen W = W + AB? where AB is the trained adapter weights. or a function which loads the saved model (frozen pretrained with AB) and merge them together while deleting the lora_a and lora_b such that i can load a fresh set of lora_a,b on another task and freeze the merged model?