Open hadaev8 opened 4 years ago
Did you by any chance use e..g NVIDIA APEX for float16 training, and compiled this in the past with an older CUDA lib installed? I had the same import error with PyTorch 1.5 and CUDA 10.2 installed. Recompiling APEX with CUDA 10.2 resolved the issue.
Now I having the same problem on the new machine. I have cuda 10.2 and pytorch 1.6 which also use cuda 10.2. Also i wonder why it want libcudart.so.10.0 while it should be 10.2. And i wonder why it work on colab.
Whole traceback
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 613, in forward
h, c = rnn(prevx, c0[i], mask_pad=mask_pad)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 409, in forward
SRU_Compute = _lazy_load_cuda_kernel() if input.is_cuda else SRU_Compute_CPU
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 35, in _lazy_load_cuda_kernel
from .cuda_functional import SRU_Compute_GPU
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/cuda_functional.py", line 14, in <module>
verbose=False
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 974, in load
keep_intermediates=keep_intermediates)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1190, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1538, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/home/ubuntu/anaconda3/lib/python3.7/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/ubuntu/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
同样遇到这个问题,自己的解决过程如下:https://blog.csdn.net/qq_32239767/article/details/109626626
Google says this error cased by different versions of pytorch compiled and system cuda version. But it should be same https://i.imgur.com/u3GTFgv.png