Can mergekit be applied to merge multiple LoRA checkpoints by weights?

authurlord commented 9 months ago

Many thanks for the great work. Can mergekit merge multiple LoRA checkpoints by provided weights, which are trained on the same base model, and output a merged LoRA weights? Peft has an add_weighted_adapter(https://huggingface.co/docs/peft/v0.7.1/en/package_reference/lora#peft.LoraModel.add_weighted_adapter) method to merge multiple LoRA, but only have 'linear/concat/svd' methods.

Besides, for the mixtral_merge method, is the gated network activated by checking pos and neg prompts provided with each experts?

SivilTaram commented 9 months ago

@authurlord I'm not the author of this repo and accidently see the repo. For LoRA weights merging, lorahub is a candidate method (https://github.com/sail-sg/lorahub).

sophosympatheia commented 9 months ago

+1 for this feature request. It would open up some interesting merge possibilities if we could leverage mergekit's advanced methods for merging LoRA adapters.

chanchimin commented 1 month ago

Question on LoRA and Task Vector

Hello all, I have some questions regarding LoRA and task vectors. Typically, the task vector is defined as:

$$ \theta{\text{sft}} - \theta{\text{pt}} $$

where $\theta{\text{sft}}$ is the fine-tuned model and $\theta{\text{pt}}$ is the pre-trained model.

For LoRA, the formulation is:

$$ H{\text{hidden}} = X{\text{input}} \cdot W{\text{frozen}} + X{\text{input}} \cdot W_a \cdot W_b $$

where $W_a \cdot W_b$ represents the low-rank adaptation weights.

My question is: isn't the LoRA module, $W_a \cdot W_b$, naturally equivalent to the task vector? If so, then the methods typically used in task vector merging (such as TIER-Merging) should still be applicable to the LoRA weights.

Please correct me if my understanding is wrong, thank you guys!

arcee-ai / mergekit

Can mergekit be applied to merge multiple LoRA checkpoints by weights? #86

Question on LoRA and Task Vector