LongLora suport - Githubissues

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

https://axolotl-ai-cloud.github.io/axolotl/

Apache License 2.0

7.84k stars 863 forks source link

LongLora suport #623

Open generalsvr opened 1 year ago

generalsvr commented 1 year ago

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Can you implement this new LoRA method? That would be great to have 32k+ LoRA models. Looks promising.

✔️ Solution

https://github.com/dvlab-research/LongLoRA

http://arxiv.org/abs/2309.12307

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

NanoCode012 commented 1 year ago

Seems like they provided a patch for llama in their repo.

Parts I've noticed:

during merge: need to load this

trainable_params = os.path.join(args.peft_model, "trainable_params.bin")
if os.path.isfile(trainable_params):
    model.load_state_dict(torch.load(trainable_params, map_location=model.device), strict=False)

for these layers "embed,norm" https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/merge_lora_weights_and_save_hf_model.py#L98-L100

patch has a FA and non-FA version https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/llama_attn_replace.py#L336
depend on rope_scaling also for base model https://github.com/dvlab-research/LongLoRA/blob/f34c8971b6c5a9cbd1e3a98d6483b750aef14cda/fine-tune.py#L110-L113

winglian commented 1 year ago

May want to keep track of https://github.com/huggingface/peft/issues/958 in case it is supported there.

winglian commented 1 year ago

looking at the shift/unshift code, it seems it's not packed sequence length aware, so that would need some modification (or simply not allow packed sequences to work w this features)

DhruvaBansal00 commented 8 months ago

Is this something that is on the roadmap?