Open DuarteMRAlves opened 1 year ago
Hey @DuarteMRAlves I don't disagree. It should be fairly doable to take the current conversion script and rearrange the state dict. Help welcome :-)
piling on, I think this will be useful cause hf weights can split to multi-gpu during inference which is useful for bigger models.
@timothylimyl Lit-Parrot supports this via FSDP, added in https://github.com/Lightning-AI/lit-parrot/commit/248d691f06d68c7e92d3230260eda0055f7dc163. Support for this could be easily ported to Lit-LlaMA
That's awesome, any plans on supporting FSDP inference for lit-llama too?
I will give it a look too to see whether will I be able to replicate what you did on lit-parrot. However, my initial intuition was that it is not that straight forward? At least my guesses would be that you need to embed some sort of heuristic to know at which layer to separate the model given the number of gpus provided.
Edit: I really think this is very important feature, it gives a lot of flexibility in terms of personal hardware constraints during inference.
Yes, but it would be better if you or somebody else from the community works on the port.
The sharding is configured via the auto_wrap_policy
function used in the commit I linked (PyTorch docs)
Yes, but it would be better if you or somebody else from the community works on the port.
The sharding is configured via the
auto_wrap_policy
function used in the commit I linked (PyTorch docs)
Any particular reason for this?
I will give it a shot when available, I am now using another repo just because I can load using hugging face auto device mapping (but reckon lit-llama is still the best cause other repo's multi-gpu training is pretty broken)
Piling on here, comments in scripts/convert_hf_checkpoint.py say it's doing the inverse of https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py , so it would be reasonable to assume that Immediately after creating a .pth model with convert_hf_checkpoint.py , you could convert it back again with convert_llama_weights_to_hf.py , and get the original model back. But in fact, once you create the missing params.json file, for the 7B model convert_llama_weights_to_hf.py fails with :
KeyError: 'layers.0.attention.wq.weight'
Hi!
I was wondering that Is there any update regarding the conversion of the lit-llama fine-tuned merged weights (LoRA) to hugging face format?
@devrituraj if there's auto-device mapping (multi-gpu) in lit-llama/lit-gpt, would you consider that it is not necessary to change to hugging face format?
Hello @carmocca , I believe I have a solution to port a Lit-LLaMA checkpoint over to huggingface format--could I be assigned this issue?
Hello, I was wandering whether you are planning on releasing a script to convert weights trained with this repository to the huggingface format?
Currently, huggingface is the best way to share models across the community and I think it would be very beneficial for the adoption of this framework to be able to convert from models trained with this code to huggingface.