Open jimlloyd opened 2 years ago
Within a few minutes after writing this I did more searching and found this issue:
https://github.com/tensorflow/tensorflow/issues/40202
There is a comment: "A more reliable workaround is to install the cuda toolkit using Nvidia's .run file installer."
I'm going to try that.
Yes, that may be the way or you could take a look at the official NVIDIA CUDA Docker image source on which we run the CI: https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/11.7.0/ubuntu2204/runtime/Dockerfile
I believe this problem is probably the fault of the tensorflow configure scripts rather than anything specific to tensorflow_cc but I am hoping perhaps someone might have information for how to work around the problem.
The problem is that after doing
cd tensorflow_cc && mkdir build && cd build && cmake .. && make
themake
fails with this error:I have been trying to install onto a freshly created Ubuntu 20.04 or 22.04. I have tried various methods of installing the CUDA and CUdnn and all methods tried have resulted in this error.
By the way, the first method I tried was to use the Lambda Stack on 22.04. It would be awesome if
tensorflow_cc
was compatible with Lambda Stack. But when I discovered this "Inconsistent CUDA toolkit path" problem I concluded that Lambda Stack probably somehow altered the paths at which CUDA and cudnn were installed so I switched to more standard ways of installing. I have since learned that I run into the same problem when not using Lambda Stack, so I am hopeful that once I figure out how to solve the problem I will be able to use Lambda Stack.My most recent attempt was with 20.04. I installed:
sudo apt install nvidia-cuda-toolkit
.cudnn-10.1-linux-x64-v8.0.5.39
from NVidia's website and following the instructions to untar and then copy the components into/usr/local/cuda/...
FYI I have of course spent time searching for information about this exact problem "Inconsistent CUDA toolkit path:". I know it is an exception thrown from
tensorflow/third_party/gpus/find_cuda_config.py
. The problematic code is commented:I have tried various hacks with the code, including simply commenting out the code that raises the exception, which allows the build to proceed but eventually results in a similar exception being raised, presumably when building XLA.
Does anyone know how to workaround this problem?