How to run the finetuned model with LoRA adapters.

thisurawz1 commented 4 months ago

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

Yogesh914 commented 4 months ago

Hi @thisurawz1, I was wondering if you were available for a call or text, we are currently experiencing some issues when fine tuning with finetune_lora.sh file, and was wondering if we could use your guidance.

I have a discord as well if you prefer, let me know what works best for you

thisurawz1 commented 4 months ago

You can contact me on Discord - "wick6309". However, I'm not very active on Discord and mainly use WeChat. Anyway, I've posted below all the issues I encountered and their solutions for everyone's reference.

I mainly used the QLoRA script and did a fine-tuning as a trial run. My dataset was quite small, around 229 samples (image and text). I encountered the following issues while doing the fine-tuning. I used 1 A100 40GB GPU, but the VRAM was not enough to run the QLoRA script with a batch size of 4, so I had to change it to 2.

1 Adjust the number of GPUs available in your PC

Error: RuntimeError: CUDA error: invalid device ordinal
Solution: Change the number of devices in the script to the available devices on your machine.
code import torch print(torch.cuda.device_count()) # to see the number of GPUs available in your device ARG_WORLD_SIZE=${1:-1} ARG_NPROC_PER_NODE=${2:-1} # Adjust based on available GPUs in the LoRA or QLoRA script (in my case, it's just 1 GPU)

2 Hugging Face offline mode error

code export TRANSFORMERS_OFFLINE=0 # Temporarily disable offline mode in the script

3 Cannot access "mistralai/Mistral-7B-Instruct-v0.2" as it is a private repo

Error: Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json. Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted. You must be authenticated to access it.
Solution: First, go to this repo and apply for access, then copy your Hugging Face read token.
code from huggingface_hub import login login(token="your_huggingface_token") # Add this to the script file

4 mm_projector.bin couldn't be found

Error: FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/main/mm_projector.bin'
Solution: Make sure you download mm_projector.bin to the correct path.
code python -c " from huggingface_hub import hf_hub_download hf_hub_download(repo_id='DAMO-NLP-SG/VideoLLaMA2-7B-Base', filename='mm_projector.bin') "

5 change the dataset path and folder.

solution: use the repo guide to make the dataset structure
code --data_path datasets/custom_sft/custom.json \ # path --data_folder datasets/custom_sft/ #folder

6 NCCL error/ CUDA error/ Not enough VRAM

Error: torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1704987394225/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3 ncclUnhandledCudaError: Call to CUDA function failed. Last error: Failed to CUDA calloc async 24 bytes
Solution: This is mainly because you don't have enough VRAM. You can reduce the batch size. For QLoRA fine-tuning, at least 40GB of VRAM is recommended. If you have more VRAM, you can increase the batch size.
code # Training Arguments GLOBAL_BATCH_SIZE=128 # Reduce from 128 to 64 or you can keep 128 LOCAL_BATCH_SIZE=2 # Change these two in the script from 4 to 2

Hi @thisurawz1, I was wondering if you were available for a call or text, we are currently experiencing some issues when fine tuning with finetune_lora.sh file, and was wondering if we could use your guidance.

I have a discord as well if you prefer, let me know what works best for you

Yogesh914 commented 4 months ago

Hey @thisurawz1 thanks a lot for the reply, it made things clear, I am working with @lucasxu777 on this so if you could add him that would be great since he has wechat! I have also added you on discord as well ".yogiii" is my username.

Yogesh914 commented 4 months ago

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

It was solved here: #32

thisurawz1 commented 4 months ago

Hey @thisurawz1, thanks for sharing the information here!!! I wonder if I can add you on WeChat so that we can make the conversations easier maybe for future work :)). My WeChat account is: kjw4LV

noted. ill add you

thisurawz1 commented 4 months ago

i have successfully fine-tuned the model using QLORA for a custom use case. now i have the LoRA adapters and can you tell how to use it for the inference. maybe merge lora weights with the original model and do the inference.

It was solved here: #32

Thanks. ill add your friend. is there any proper guide on how to do the inference with the lora fine tuned model.

LiangMeng89 commented 4 weeks ago

kjw4LV is not work, please add my wechat: LiangMeng19357260600, so we can talk about how to use videollama2 in our domain research work.

LiangMeng89 commented 4 weeks ago

我已经成功地使用 QLORA 对模型进行了动作，以适应习惯的例子。现在我有了 LoRA 玩具，您能告诉我如何使用它进行推理吗？也许会将 lora 权重与原始模型合并并进行推理。

已在这里解决：#32

Hello, we also can add wechat, kjw4LV is not work, please add my wechat: LiangMeng19357260600, so we can talk about how to use videollama2 in our domain research work.Thanks.

DAMO-NLP-SG / VideoLLaMA2