Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
5.97k stars 516 forks source link

Convert lit-llama weights to huggingface #150

Open DuarteMRAlves opened 1 year ago

DuarteMRAlves commented 1 year ago

Hello, I was wandering whether you are planning on releasing a script to convert weights trained with this repository to the huggingface format?

Currently, huggingface is the best way to share models across the community and I think it would be very beneficial for the adoption of this framework to be able to convert from models trained with this code to huggingface.

lantiga commented 1 year ago

Hey @DuarteMRAlves I don't disagree. It should be fairly doable to take the current conversion script and rearrange the state dict. Help welcome :-)

timothylimyl commented 1 year ago

piling on, I think this will be useful cause hf weights can split to multi-gpu during inference which is useful for bigger models.

carmocca commented 1 year ago

@timothylimyl Lit-Parrot supports this via FSDP, added in https://github.com/Lightning-AI/lit-parrot/commit/248d691f06d68c7e92d3230260eda0055f7dc163. Support for this could be easily ported to Lit-LlaMA

timothylimyl commented 1 year ago

That's awesome, any plans on supporting FSDP inference for lit-llama too?

I will give it a look too to see whether will I be able to replicate what you did on lit-parrot. However, my initial intuition was that it is not that straight forward? At least my guesses would be that you need to embed some sort of heuristic to know at which layer to separate the model given the number of gpus provided.

Edit: I really think this is very important feature, it gives a lot of flexibility in terms of personal hardware constraints during inference.

carmocca commented 1 year ago

Yes, but it would be better if you or somebody else from the community works on the port.

The sharding is configured via the auto_wrap_policy function used in the commit I linked (PyTorch docs)

timothylimyl commented 1 year ago

Yes, but it would be better if you or somebody else from the community works on the port.

The sharding is configured via the auto_wrap_policy function used in the commit I linked (PyTorch docs)

Any particular reason for this?

I will give it a shot when available, I am now using another repo just because I can load using hugging face auto device mapping (but reckon lit-llama is still the best cause other repo's multi-gpu training is pretty broken)

RDouglasSharp commented 1 year ago

Piling on here, comments in scripts/convert_hf_checkpoint.py say it's doing the inverse of https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py , so it would be reasonable to assume that Immediately after creating a .pth model with convert_hf_checkpoint.py , you could convert it back again with convert_llama_weights_to_hf.py , and get the original model back. But in fact, once you create the missing params.json file, for the 7B model convert_llama_weights_to_hf.py fails with :

KeyError: 'layers.0.attention.wq.weight'

devrituraj commented 1 year ago

Hi!

I was wondering that Is there any update regarding the conversion of the lit-llama fine-tuned merged weights (LoRA) to hugging face format?

timothylimyl commented 1 year ago

@devrituraj if there's auto-device mapping (multi-gpu) in lit-llama/lit-gpt, would you consider that it is not necessary to change to hugging face format?

wjurayj commented 1 year ago

Hello @carmocca , I believe I have a solution to port a Lit-LLaMA checkpoint over to huggingface format--could I be assigned this issue?