SeanNaren / warp-ctc

Pytorch Bindings for warp-ctc
Apache License 2.0
756 stars 271 forks source link

Import error: libcudart.so.9.2 #120

Closed hukkai closed 5 years ago

hukkai commented 5 years ago

Traceback (most recent call last): File "", line 1, in File "/home/fyw/anaconda3/lib/python3.7/site-packages/warpctc_pytorch-0.1-py3.7-linux-x86_64.egg/warpctc_pytorch/init.py", line 6, in from ._warp_ctc import * ImportError: libcudart.so.9.2: cannot open shared object file: No such file or directory

Hello, I am using PyTorch 1.0.1.post2, cmake 3.10.2, Ubuntu 18.04.1 LTS

Do you have any ideas about how to solve this error?

hukkai commented 5 years ago

Solved https://github.com/KlausT/ccminer/issues/149

dzubke commented 3 years ago

The ImportError is looking for the Cuda 9.2 installation. Given that updating the LD_LIBRARY_PATH environment variable in the link you shared fixed your issue, I'm assuming you already had Cuda 9.2 installed but you hadn't updated the environment variable in the call below:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

For anyone else that encounters this issue and updating the environment variable doesn't work, you should remove all your existing Cuda installs and do a fresh install of Cuda 9.2. There probably are other ways to run this warpctc implementation with a newer version of Cuda (like 10 or 11) but I haven't needed to use a newer yet and so have just been using Cuda 9.2.

You can see your cuda version using the command: /usr/local/cuda/bin/nvcc --version

Or by looking for the install numbers at the end of the directories from this call: ls usr/local/ | grep cuda

Or in Pytorch, by calling: torch.version.cuda

To uninstall all of your previous Cuda installations to prevent conflicts, you can find different ways of uninstalling Cuda from this link: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation

In that link, you'll see some combination of these calls to remove old installs: sudo apt-get --purge remove cuda sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl sudo /usr/bin/nvidia-uninstall

If none of those options work, you can try just removing the Cuda directories from /usr/local/ with the command: sudo rm -rf /usr/local/cuda-X.Y and remove the symbolic link: sudo rm /usr/loca/cuda

Then, you can install Cuda 9.2. This is a good explanation of that installation: here.

For me the commands below work: wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux sudo sh cuda_9.2.148_396.37_linux

And this installs a patch the Cuda 9.2: wget https://developer.nvidia.com/compute/cuda/9.2/Prod2/patches/1/cuda_9.2.148.1_linux sudo sh cuda_9.2.148.1_linux

You should also update certain environment variables as described in the install explanation link above like your PATH and LD_LIBRARY_PATH variables.

Occasionally, I've run into issues where existing Nvidia drivers have thrown me off and the Cuda 9.2 installation didn't work. I got the following errors during installations:

"It appears that an X server is running. Please exit X before installation. If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp."

To correct this, I ran the command below to stop the X server: sudo service lightdm stop

Then, I had to remove the existing drivers by running the installer again and consulting the installer log: less /var/log/nvidia-installer.log

After doing that a few times by running the commands below, I was finally able to install Cuda 9.2 and the library worked. sudo apt-get remove --purge nvidia-440 nvidia-modprobe nvidia-settings sudo apt-get remove --purge nvidia-455 nvidia-modprobe nvidia-settings

This is a bit verbose, but just wanted to document my experience for others.