Open generalsvr opened 1 year ago
Seems like they provided a patch for llama in their repo.
Parts I've noticed:
during merge: need to load this
trainable_params = os.path.join(args.peft_model, "trainable_params.bin")
if os.path.isfile(trainable_params):
model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False)
for these layers "embed,norm"
https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/merge_lora_weights_and_save_hf_model.py#L98-L100
patch has a FA and non-FA version https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/llama_attn_replace.py#L336
depend on rope_scaling also for base model https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/fine-tune.py#L110-L113
May want to keep track of https://github.com/huggingface/peft/issues/958 in case it is supported there.
looking at the shift/unshift code, it seems it's not packed sequence length aware, so that would need some modification (or simply not allow packed sequences to work w this features)
Is this something that is on the roadmap?
⚠️ Please check that this feature request hasn't been suggested before.
🔖 Feature description
Can you implement this new LoRA method? That would be great to have 32k+ LoRA models. Looks promising.
✔️ Solution
https://github.com/dvlab-research/LongLoRA
http://arxiv.org/abs/2309.12307
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements