Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 11.1 [SOLUTION]

Xirider / finetune-gpt2xl

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed

MIT License

431 stars 73 forks source link

Exception: Installed CUDA version 11.0 does not match the version torch was compiled with 11.1 [SOLUTION] #1

Closed CupOfGeo closed 3 years ago

CupOfGeo commented 3 years ago

Hey first off awesome project Im getting this error when i try to run the deepspeed command. I found my solution if anyone else has this problem

wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run sudo sh cuda_11.1.1_455.32.00_linux.run

Xirider commented 3 years ago

Hi, thanks for trying it out! Did you use the most recent version of the guide with the --image-family pytorch-1-7-cu110 flag instead of --image-family pytorch-latest-gpu? @CupOfGeo I thought this had solved the issue.

Google upgraded pytorch-latest-gpu today to an container with pytorch 1.8 and its cuda version was not compatible with the current release of deepspeed.

CupOfGeo commented 3 years ago

No sorry i used the --image-family pytorch-latest-gpu I had the page open for a day or so and i finally had time to do it today guess id didn't refresh thanks