Open meltzerpete opened 4 years ago
Please make sure your nvcc version is also CUDA 10.1. You can check with nvcc --version
.
thanks for reply, I have
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
I guess this is the problem?
Yeah, things should work if you install at 10.0 version of pytorch or can get 10.1 compilation tools
I installed 10.1 compilation tools with $ conda install cudatoolkit-dev -c conda-forge
, so I now have
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
I also reinstalled the pointnet2-ops package with
$ pip install --user --force-reinstall --ignore-installed --no-binary :all: pointnet2_ops_lib
but I am still getting the same error.
I have also tried installing the pytorch version with cu100 using $ conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
but am getting the same error.
I have met the same problem mentioned here, and the solution I took is to reinstall the pytorch and downgrade to 1.4
version with consistent cuda version 10.0 (as the same as the version from nvcc -V
), then reinstalled the pointnet2-ops package. Finally, the error was gone away.
Python version: 3.6.12 Pytorch version: 1.4.0 Cuda version: 10.0
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch
Hope my solution can help someone encountering this same issue.
I have met the same problem mentioned here, and the solution I took is to reinstall the pytorch and downgrade to
1.4
version with consistent cuda version 10.0 (as the same as the version fromnvcc -V
), then reinstalled the pointnet2-ops package. Finally, the error was gone away.Python version: 3.6.12 Pytorch version: 1.4.0 Cuda version: 10.0
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.0 -c pytorch
Hope my solution can help someone encountering this same issue.
I tried this method and it works correctly, thanks! I think this bug may result from cudatoolkit version. It seems that cudatoolkit=10.0 works but cudatoolkit=10.1 doesn't work.
i have the same problem. which version of cuda or nvcc or torch should i use?
nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 On | N/A | | 32% 29C P8 27W / 350W | 1199MiB / 24576MiB | 5% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0
* please refor https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html to for Minimum Required Driver Version for CUDA Minor Version Compatibility driver version is 535.161.07 sys.version : 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0] torch version : 2.3.0.dev20231227 installed cuda version : 12.1 CUDA Compute Capability: 8.6 Microarchitecture Name: Ampere (3090, cuda >= 11.1, driver >=455.32) pytorch compiled for : ['sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90'] torch.cuda.is_available : True torch.backends.cudnn.enabled : True torch.cuda.get_device_properties(device) : _CudaDeviceProperties(name='NVIDIA GeForce RTX 3090', major=8, minor=6, total_memory=24237MB, multi_processor_count=82) SYSTEM CUDA_PATH: None LD_LIBRARY_PATH: /root/Workspace/hdl_loc/devel/lib:/root/Workspace/ws_livox/devel/lib:/opt/ros/noetic/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 torch.tensor([1.0, 2.0]).cuda() : tensor([1., 2.], device='cuda:0')
Here is the best solution: https://github.com/mkt1412/GraspGPT_public/issues/8
train/test produce the following error:
I have run the training and test before successfully on this machine and cannot work out why now it fails - I do not think I have changed anything about the environment.
output of
nvidia-smi
:versions (
$ conda list
):(I have also tried changing to other versions of pytorch-lightning - 0.7.1/0.84).
Any help greatly appreciated.