asappresearch / sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)
MIT License
2.11k stars 306 forks source link

Cant use on pytorch 1.5/cuda 10.2 ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory #119

Open hadaev8 opened 4 years ago

hadaev8 commented 4 years ago

Google says this error cased by different versions of pytorch compiled and system cuda version. But it should be same https://i.imgur.com/u3GTFgv.png

visionscaper commented 4 years ago

Did you by any chance use e..g NVIDIA APEX for float16 training, and compiled this in the past with an older CUDA lib installed? I had the same import error with PyTorch 1.5 and CUDA 10.2 installed. Recompiling APEX with CUDA 10.2 resolved the issue.

hadaev8 commented 4 years ago

Now I having the same problem on the new machine. I have cuda 10.2 and pytorch 1.6 which also use cuda 10.2. Also i wonder why it want libcudart.so.10.0 while it should be 10.2. And i wonder why it work on colab.

hadaev8 commented 4 years ago

Whole traceback

  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 613, in forward
    h, c = rnn(prevx, c0[i], mask_pad=mask_pad)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 409, in forward
    SRU_Compute = _lazy_load_cuda_kernel() if input.is_cuda else SRU_Compute_CPU
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/sru_functional.py", line 35, in _lazy_load_cuda_kernel
    from .cuda_functional import SRU_Compute_GPU
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/sru/cuda_functional.py", line 14, in <module>
    verbose=False
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 974, in load
    keep_intermediates=keep_intermediates)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1190, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1538, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/home/ubuntu/anaconda3/lib/python3.7/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/anaconda3/lib/python3.7/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
zb-tjw commented 4 years ago

同样遇到这个问题,自己的解决过程如下:https://blog.csdn.net/qq_32239767/article/details/109626626