Open yyq opened 1 year ago
To ensure that the CUDA version used to compile your Torch C++ plugin matches the runtime version of your current CUDA Toolkit, you can use the following Python command:
import torch
print(torch.version.cuda)
This command will print the CUDA version that was used to compile PyTorch. Please ensure that this version matches the version of your installed CUDA Toolkit.
In addition, please note that PyTorch version 2.0.0 and above are not yet supported. You should ensure that your installed version of PyTorch is less than 2.0.0. You can check the PyTorch version with the following Python command:
import torch
print(torch.__version__)
If your PyTorch version is not compatible, please downgrade PyTorch to a compatible version using pip or conda, depending on how you initially installed PyTorch.
To ensure that the CUDA version used to compile your Torch C++ plugin matches the runtime version of your current CUDA Toolkit, you can use the following Python command:
import torch print(torch.version.cuda)
This command will print the CUDA version that was used to compile PyTorch. Please ensure that this version matches the version of your installed CUDA Toolkit.
In addition, please note that PyTorch version 2.0.0 and above are not yet supported. You should ensure that your installed version of PyTorch is less than 2.0.0. You can check the PyTorch version with the following Python command:
import torch print(torch.__version__)
If your PyTorch version is not compatible, please downgrade PyTorch to a compatible version using pip or conda, depending on how you initially installed PyTorch.
I tried downgrade to torch.version.cuda=11.7 and touchversion=1.13.1+cu117, still the same error.
torch.version.cuda=11.7 and torchversion=1.13.1+cu117 only means the cuda version used to compile torch is 11.7.You need to make sure that the CUDA Toolkit version matches the version used to compile torch.
You can use nvidia-smi
or nvcc --version
to check the version of CUDA Toolkit.
cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)
still the same error
please ensure that you have tried pip install bmtrain --no-cache-dir
.
cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)
still the same error #26
please ensure that you have tried
pip install bmtrain --no-cache-dir
.cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3) still the same error #26
@MayDomine hi, my server environment, also had the errors.
torch == 1.13.1+cu117
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
please ensure that you have tried
pip install bmtrain --no-cache-dir
.cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3) still the same error #26
@MayDomine hi, my server environment, also had the errors.
torch == 1.13.1+cu117 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
这个环境我测试过不会出错,请检查cuda runtime的路径,pip安装是否使用cache、以及本地nccl版本是否有冲突等等
please ensure that you have tried
pip install bmtrain --no-cache-dir
.cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3) still the same error #26
@MayDomine hi, my server environment, also had the errors.
torch == 1.13.1+cu117 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
这个环境我测试过不会出错,请检查cuda runtime的路径,pip安装是否使用cache、以及本地nccl版本是否有冲突等等
python -c "import torch;print(torch.cuda.nccl.version())"
执行有结果:(2, 14, 3)
locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
执行有结果:2
我在用 transformers 进行训练的时候:
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/.conda/envs/3.9/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
@Fword4u 你好,我这边检查的环境是这样,实在看不出来哪里环境配置有冲突;
pip install bmtrain --no-cache-dir
我执行这个 pip install bmtrain --no-cache-dir
现在不报错了,想知道原因;
I'm trying the demo code, here is the information: with CUDA 12.1
the command
!python -c "import torch;print(torch.cuda.nccl.version())"
, can return(2, 14, 3)
below is the original import error stack: