bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.32k stars 634 forks source link

CUDA Setup failed despite GPU being available #1289

Open Keertiraj opened 4 months ago

Keertiraj commented 4 months ago

System Info

I am finetuning Llama3-8b-Instruct model. Here is the Jupyter Notebook of the steps, i followed to perform the finetuning:

https://gitlab.com/keerti4p/llama3-8b-instruct-finetune/-/blob/main/llama3_finetune_8b_instruct.ipynb

I am using 'g5.ml.24xlarge' AWS EC2 instance to finetune the llama3 model. However, when i execute the below code snippet:

Set supervised fine-tuning parameters

trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, # use our lora peft config dataset_text_field="text", max_seq_length=None, # no max sequence length tokenizer=tokenizer, # use the llama tokenizer args=training_arguments, # use the training arguments packing=False, # don't need packing )

I am receiving the below error:

RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Please suggest how to resolve this error.

Reproduction

Instantiate 'g5.ml.24xlarge' AWS EC2 instance through AWS Sagemaker Jupyter Lab space and run the code written in the below Jupyter notebook:

https://gitlab.com/keerti4p/llama3-8b-instruct-finetune/-/blob/main/llama3_finetune_8b_instruct.ipynb

Expected behavior

The 'Llama3-8b-Instruct' is finetuned and the model is pushed to Hugging Face.

Titus-von-Koeller commented 4 months ago

Please update bitsandbytes to the newest version and then run the command that you already mentioned above to give your debug output. Otherwise it's really hard to help you.

Seems like CUDA isn't properly installed or not detected correctly.

!nvidia-smi              # copy and post the output
!pip install --upgrade "bitsandbytes>=0.43.2"
!python -m bitsandbytes  # copy and post the output