Closed brian-dellabetta closed 2 years ago
Hi @brian-dellabetta. Thanks for your interest in cuQuantum!
The image has cupy-cuda115, the conda install of cuquantum-python installs another version of cupy as a dependency so I uninstall the old one (it will complain during import if both are available). make all builds successfully (though the lib64->lib symlink is needed for it to work), but I am unable to run the python samples without hitting import errors.
All samples require an Nvidia GPU to run. Specifically, a GPU with compute capability 7.0+. Here's a useful table.
I am running on an intel-chip mac, just trying to clear up the import errors before we run this on a cloud instance with an nvidia GPU mounted in.
I'm guessing this is the issue. The import statements will fail without a valid driver installation. Without seeing the full error output, I cannot confirm.
Before posting any stacktraces, am I on the right track here? Maybe I should use a different base image that has an equivalent version of cupy. I'm also not sure if the cuda version is incompatible.
For cuQuantum, as long as your CUDA toolkit version is 11.2+, and CuPy's version is 9.5+, you should be fine. If you have a more specific concern, please include it in your response.
I am happy to submit a PR with the working Dockerfile once we figure this all out :)
Unfortunately, we aren't accepting code contributions at this time.
I'm wondering why you're using wget
to acquire the binaries when they are automatically installed by conda
in this line:
conda install -c conda-forge cuquantum-python
(e.g.)
conda install -c conda-forge cuquantum-python
...
The following NEW packages will be INSTALLED:
...
cupy conda-forge/linux-64::cupy-10.1.0-py310h64c8dd9_1
cuquantum conda-forge/linux-64::cuquantum-0.1.0.30-h5c60f85_2
cuquantum-python conda-forge/linux-64::cuquantum-python-0.1.0.0-py310h013f86e_3
cutensor conda-forge/linux-64::cutensor-1.4.0.6-h7537e88_2
...
It is also true that all of the samples are hosted in this repository.
Let us know if you're still having trouble or if you have other questions!
One more thing:
though the lib64->lib symlink is needed for it to work
Yes, we have become aware of this issue for building cuQuantum Python from source. We'll push a fix shortly. Thanks for bringing it up, Brian.
@mtjrider I'm just trying to make sure the image is valid and has all dependencies before attempting to run on an nvidia GPU. This requires an nvidia V100 or higher for compute capability 7.0+, corresponding to a p3.2xlarge
or higher on AWS, and these get pricey, so I'm trying to tackle as much beforehand as possible.
Here's the error I'm seeing:
>>> import cuquantum
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/cupy/__init__.py", line 18, in <module>
from cupy import _core # NOQA
File "/opt/conda/lib/python3.8/site-packages/cupy/_core/__init__.py", line 1, in <module>
from cupy._core import core # NOQA
File "cupy/_core/core.pyx", line 1, in init cupy._core.core
File "/opt/conda/lib/python3.8/site-packages/cupy/cuda/__init__.py", line 8, in <module>
from cupy.cuda import compiler # NOQA
File "/opt/conda/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 14, in <module>
from cupy.cuda import function
File "cupy/cuda/function.pyx", line 1, in init cupy.cuda.function
File "cupy/_core/_carray.pyx", line 1, in init cupy._core._carray
File "cupy/_core/internal.pyx", line 1, in init cupy._core.internal
File "cupy/cuda/memory.pyx", line 1, in init cupy.cuda.memory
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
This seems to me more related to the versions of cupy and libcuda than an actual runtime error from lack of gpu. I might be mistaken though that the driver won't live in the docker image, that it will need to be installed on host and mounted into the image? I hope to try on a VM with a GPU later this week, will post updates here.
If not a Dockerfile, will an image be made available at some point on the NGC catalog or elsewhere? I'm sure it would be useful to others
Also @mtjrider the wget on the repo is just to pull in the code samples. i didn't see them in the installed directories
/opt/conda/lib/python3.8/site-packages/cuquantum_python-0.1.0.0.dist-info
/opt/conda/lib/python3.8/site-packages/cuquantum
Also, thanks for all the help!
@mtjrider I'm just trying to make sure the image is valid and has all dependencies before attempting to run on an nvidia GPU. This requires an nvidia V100 or higher for compute capability 7.0+, corresponding to a p3.2xlarge or higher on AWS, and these get pricey, so I'm trying to tackle as much beforehand as possible.
Makes perfect sense. Thanks for this clarification. To be clear, I've tested your Dockerfile on a system with GPUs to compile and run the tests, and it works without issue. When you deploy, please take care to confirm that the driver and compilation toolchain are compatible. The CUDA driver and kernel mode driver compatibility is documented here.
The following error indicates that the CUDA driver is missing. This is not installed in the container. Here is an architecture overview.
>>> import cuquantum
...
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Also @mtjrider the wget on the repo is just to pull in the code samples. i didn't see them in the installed directories /opt/conda/lib/python3.8/site-packages/cuquantum_python-0.1.0.0.dist-info /opt/conda/lib/python3.8/site-packages/cuquantum
I meant that you may also clone the samples because they are hosted in this repository:
git clone https://github.com/NVIDIA/cuQuantum.git cuquantum && \
ls -la cuquantum/samples
## custatevec
## cutensornet
Note: per this comment, I had to modify the Makefile to rename lib64
to lib
. This line. Separately, I had to set LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH
. The command I used to compile the custatevec
samples is:
CUSTATEVEC_ROOT=/opt/conda make
Here, I should note that I removed any wget
commands because they are redundant with the conda install
command.
@mtjrider thank you! The architecture diagram is what I was missing, this is super helpful. I appreciate your help in sanity checking the image in a working environment, we'll try to reproduce on our end.
I will close and re-open the issue if we have further questions. Thanks again for the help
Hi,
I am trying to build an image with cuquantum and the code samples installed. Here is what I have so far, compiled from the README here and in the documentation :
The image has
cupy-cuda115
, the conda install ofcuquantum-python
installs another version of cupy as a dependency so I uninstall the old one (it will complain during import if both are available).make all
builds successfully (though the lib64->lib symlink is needed for it to work), but I am unable to run the python samples without hitting import errors.I am running on an intel-chip mac, just trying to clear up the import errors before we run this on a cloud instance with an nvidia GPU mounted in.
Before posting any stacktraces, am I on the right track here? Maybe I should use a different base image that has an equivalent version of cupy. I'm also not sure if the cuda version is incompatible.
I am happy to submit a PR with the working Dockerfile once we figure this all out :)