Closed KOVVURISATYANARAYANAREDDY closed 1 year ago
F.conv1d(input, weight, groups=sum(self.enable_lora))
litgpt/lora.py inside LoRAQKVLinear class and conv1d method caused this issue.
something wrong with environment i guess. Creating a separate nvidia pytorch docker with nightly version resolved this issue, but now the FLOPS is less.
This must be an issue with PyTorch. There's nothing we can do about it here. If you save the inputs passed to the function and are able to reproduce it after, the PyTorch team might be able to help you after opening an issue in their repo.
Thank you @carmocca for the suggestion.
I met the same problem when loading codegen25-7b-instruct. Do you solve it? @KOVVURISATYANARAYANAREDDY
Try with different Pytorch version or use nvidia Pytorch docker that worked I guess. Can't recall what worked.
Try with different Pytorch version or use nvidia Pytorch docker that worked I guess. Can't recall what worked.
Thanks a lot. It does work.
Please post the procedure you followed, so that others can benefit. I forget to post when I did, and i don't remember exactly. Thank you.
Hello,
Firstly thank you for such great code, I really appreciate the work.
I am trying to run Llama-2-13b-chat-hf/llama-2-13b-hf models. I followed the same procedure as mentioned here to download model from huggingface and convert to lit-gpt format.
Then i prepared the alpaca data using /scripts/prepare_alpaca.py
Made small change in finetune/lora.py inside get_batch function.
And then i ran finetune/lora.py code with prepared alpaca dataset and checkpoints/meta-llama/Llama-2-13b-hf/ checkpoint.
I am having 8 A100 40GB GPUs. Trying to run on multiple GPUs. device=4
i am facing following error which i am not facing with llama-7b model.
Also i have some questions how can we apply this lit-gpt method to other models like Starcoder?. PLease suggest steps in doing.
Please Suggest changes, Thank you in advance.