Open ashleytsmith opened 7 months ago
Can you clarify your target usage of these tools together in the container?
Sure. I was using cuquantum-benchmarks to compare different simulator backends. Ideally, I wanted to be able to run commands with these backends which are listed in the supported backends in the main description in https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/run.py
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-qubit --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-gpu --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-kokkos --benchmark qaoa --nqubits 16
I see. Thanks. The root cause seems to be mismatched CUDA versions.
The container is built with CUDA 11 on x86-64 architectures.
The aarch64 (arm64) container is built with CUDA 12. Is it possible for you to confirm this issue doesn't occur on platforms with that architecture?
Adding @tlubowe for his awareness.
Unfortunately not. If I try to run any cuquantum-benchmarks commands on Mac or a Linux laptop I get this error:
Traceback (most recent call last):
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/__init__.py", line 17, in <module>
from cupy import _core # NOQA
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/_core/__init__.py", line 3, in <module>
from cupy._core import core # NOQA
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 5, in <module>
from cuquantum_benchmarks.run import run
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 10, in <module>
from .backends import backends
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/__init__.py", line 6, in <module>
from .backend_cutn import cuTensorNet
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_cutn.py", line 11, in <module>
import cupy as cp
File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/__init__.py", line 19, in <module>
raise ImportError(f'''
ImportError:
================================================================
Failed to import CuPy.
If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.
On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.
Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html
Original error:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Its quite strange because when I run:
pip install cupy-cuda11x
I get a message that the requirements are already satisfied.
I have only been able to run cuquantum-benchmarks commands on a HPC system with x86 architecture (one sentence in my original message was really unclear about this so I have edited it).
I realised you could have meant to try and run benchmarks inside a different version of the container. I tried to do the same workflow inside this container:
nvcr.io/nvidia/cuquantum-appliance:23.10-arm64
I can’t build the benchmark suite. When I run.
pip install .[all]
I get this error:
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-slb1t_fg/pennylane-lightning-gpu_e30d32e262ec4b4694a7e3d08281e60f/setup.py", line 169, in <module>
with open(os.path.join("pennylane_lightning", "core", "_version.py"), encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'pennylane_lightning/core/_version.py'
[end of output]
Hi @ashleytsmith why do you need to run the benchmarks for Pennylane in any of the containers? Pennylane is not built in the containers. You should be able to build your own PL environment separate from the containers where you can define what you need. Is this not a viable solution?
Hi @tlubowe I wanted to run the cuquantum benchmarks suite inside the cuQuantum appliance container because quite a few of the benchmarks in the suite explicitly rely on things that can be tricky to install or are only available inside the container e.g. qsim-mgpu https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/run.py
When one clones the benchmark suite and runs the setup then pennylane is installed inside the conda environment within the container and appears when one does conda list. As I was building docker files for other software anyway I did attempt to build my own container and install your benchmark suite. It did not work, I can’t run anything from the benchmark suite there e.g. I can’t run
cuquantum-benchmarks circuit --frontend qiskit --backend aer --benchmark qaoa --nqubits 16
(This is the error when I try to run it on a linux HPC cluster)
File "/opt/conda/envs/baseenv/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_qiskit.py", line 47, in find_version
if hasattr(qiskit_aer, "__version__"):
NameError: name 'qiskit_aer' is not defined
Here is the dockerfile.
FROM rockylinux:9.3
# Set environment variables
ENV MINICONDA_VERSION=py310_24.4.0-0
ENV PATH=/opt/conda/bin:$PATH
ENV PYTHON_VERSION=3.10
# Install necessary packages
RUN yum -y install wget bzip2 tar environment-modules git && \
yum clean all
# Download and install Miniconda
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O /tmp/miniconda.sh && \
/bin/bash /tmp/miniconda.sh -b -p /opt/conda && \
rm /tmp/miniconda.sh
# Initialize Conda in bash config
RUN echo "source /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
# Create and activate a new conda enviroment
RUN conda create --name cuquant python=${PYTHON_VERSION} && \
echo "source /opt/conda/bin/activate cuquant >> ~/.bashrc
# Install packages
RUN git clone https://github.com/NVIDIA/cuQuantum.git \
&& cd cuQuantum/benchmarks \
&& /opt/conda/envs/cuquant/bin/pip install .[all]
# Set the default command to run when starting the container
CMD [ "/bin/bash" ]
I really would have liked to be able to run the pennylane benchmarks from your suite (also qulacs) but there is a lot more additional effort required compared to the qiskit and cirq ones.
The issue was also present in the 23.06 container as well.
Example to reproduce error:
Source of error: Line 46 https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/backends/backend_pny.py
Full trace:
There is a similar error present when trying to run the kokkos backend as well.
The libraries are present when doing conda list but not when importing in a script (or with interactive Python).
Here is the docker file I used for building the image:
(Also had the same error using docker commit route). I also tried rebuilding cuquantum-benchmarks to fix the problem. This didn’t help.
Suspected solution:
Maybe the import statements are wrong. From a brief look at the pennyLane documentation I found a different method for importing.
This at least throws me a cuda error:
This is being solved here: https://discuss.pennylane.ai/t/pennylane-lightning-gpu-0-35-on-cuquantum-appliance-23-10/4393
Related bug:
I also noticed the CPU backend does not work in the 23.10 container either (This ran for me in the 23.06 container):