Open Keertiraj opened 4 months ago
Please update bitsandbytes to the newest version and then run the command that you already mentioned above to give your debug output. Otherwise it's really hard to help you.
Seems like CUDA isn't properly installed or not detected correctly.
!nvidia-smi # copy and post the output
!pip install --upgrade "bitsandbytes>=0.43.2"
!python -m bitsandbytes # copy and post the output
System Info
I am finetuning Llama3-8b-Instruct model. Here is the Jupyter Notebook of the steps, i followed to perform the finetuning:
https://gitlab.com/keerti4p/llama3-8b-instruct-finetune/-/blob/main/llama3_finetune_8b_instruct.ipynb
I am using 'g5.ml.24xlarge' AWS EC2 instance to finetune the llama3 model. However, when i execute the below code snippet:
Set supervised fine-tuning parameters
trainer = SFTTrainer( model=model, train_dataset=dataset, peft_config=peft_config, # use our lora peft config dataset_text_field="text", max_seq_length=None, # no max sequence length tokenizer=tokenizer, # use the llama tokenizer args=training_arguments, # use the training arguments packing=False, # don't need packing )
I am receiving the below error:
RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information:
Please suggest how to resolve this error.
Reproduction
Instantiate 'g5.ml.24xlarge' AWS EC2 instance through AWS Sagemaker Jupyter Lab space and run the code written in the below Jupyter notebook:
https://gitlab.com/keerti4p/llama3-8b-instruct-finetune/-/blob/main/llama3_finetune_8b_instruct.ipynb
Expected behavior
The 'Llama3-8b-Instruct' is finetuned and the model is pushed to Hugging Face.