Can't run pennyLane benchmarks in 23.10 cuQuantum Appliance

ashleytsmith commented 7 months ago

The issue was also present in the 23.06 container as well.

Example to reproduce error:

cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-gpu --benchmark qaoa --nqubits 16

Source of error: Line 46 https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/backends/backend_pny.py

Full trace:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 46, in find_version
    import pennylane_lightning_gpu
ModuleNotFoundError: No module named 'pennylane_lightning_gpu'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 8, in <module>
    sys.exit(run())
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
    runner.run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 90, in run
    self._run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 267, in _run
    backend = createBackend(
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/__init__.py", line 34, in createBackend
    return backends[backend_name](ngpus, ncpu_threads, precision, *args, **kwargs)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 40, in __init__
    self.version = self.find_version(identifier) 
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 48, in find_version
    raise RuntimeError("PennyLane-Lightning-GPU plugin is not installed") from e
RuntimeError: PennyLane-Lightning-GPU plugin is not installed

There is a similar error present when trying to run the kokkos backend as well.

The libraries are present when doing conda list but not when importing in a script (or with interactive Python).

conda list

pennylane                 0.35.1                   pypi_0    pypi
pennylane-lightning       0.35.1                   pypi_0    pypi
pennylane-lightning-gpu   0.35.1                   pypi_0    pypi

Here is the docker file I used for building the image:

FROM  nvcr.io/nvidia/cuquantum-appliance:23.10
RUN git clone https://github.com/NVIDIA/cuQuantum.git \
&& cd cuQuantum/benchmarks \
&& pip install .[all]

(Also had the same error using docker commit route). I also tried rebuilding cuquantum-benchmarks to fix the problem. This didn’t help.

Suspected solution:

Maybe the import statements are wrong. From a brief look at the pennyLane documentation I found a different method for importing.

import pennylane as qml
dev = qml.device("lightning.gpu", wires=2)

This at least throws me a cuda error:

/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:72: UserWarning: libcudart.so.12: cannot open shared object file: No such file or directory
  warn(str(e), UserWarning)
/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py:1014: UserWarning: 
                "Pre-compiled binaries for lightning.gpu are not available. Falling back to "
                "using the Python-based default.qubit implementation. To manually compile from "
                "source, follow the instructions at "
                "https://pennylane-lightning.readthedocs.io/en/latest/installation.html.",

  warn(

This is being solved here: https://discuss.pennylane.ai/t/pennylane-lightning-gpu-0-35-on-cuquantum-appliance-23-10/4393

Related bug:

I also noticed the CPU backend does not work in the 23.10 container either (This ran for me in the 23.06 container):

cuquantum-benchmarks circuit --frontend pennylane --backend pennylane --benchmark qaoa --nqubits 16

024-04-05 13:04:25,345 INFO     * Running qaoa with 1 CPU threads, and 16 qubits [pennylane-v0.35.1 | pennylane-v0.35.1]:
Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 8, in <module>
    sys.exit(run())
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
    runner.run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 90, in run
    self._run()
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 301, in _run
    preprocess_data = backend.preprocess_circuit(
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 114, in preprocess_circuit
    self.circuit = self._make_qnode(circuit, nshots, **kwargs)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_pny.py", line 101, in _make_qnode
    dev = pennylane.device("default.qubit", wires=self.nqubits, shots=nshots, c_dtype=self.dtype)
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/pennylane/__init__.py", line 378, in device
    dev = plugin_device_class(*args, **options)
TypeError: DefaultQubit.__init__() got an unexpected keyword argument 'c_dtype'

mtjrider commented 7 months ago

Can you clarify your target usage of these tools together in the container?

ashleytsmith commented 7 months ago

Sure. I was using cuquantum-benchmarks to compare different simulator backends. Ideally, I wanted to be able to run commands with these backends which are listed in the supported backends in the main description in https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/run.py

cuquantum-benchmarks circuit --frontend pennylane --backend pennylane --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-qubit --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-gpu --benchmark qaoa --nqubits 16
cuquantum-benchmarks circuit --frontend pennylane --backend pennylane-lightning-kokkos --benchmark qaoa --nqubits 16

mtjrider commented 7 months ago

I see. Thanks. The root cause seems to be mismatched CUDA versions.

The container is built with CUDA 11 on x86-64 architectures.

The aarch64 (arm64) container is built with CUDA 12. Is it possible for you to confirm this issue doesn't occur on platforms with that architecture?

mtjrider commented 7 months ago

Adding @tlubowe for his awareness.

ashleytsmith commented 7 months ago

Unfortunately not. If I try to run any cuquantum-benchmarks commands on Mac or a Linux laptop I get this error:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/__init__.py", line 17, in <module>
    from cupy import _core  # NOQA
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/_core/__init__.py", line 3, in <module>
    from cupy._core import core  # NOQA
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cuquantum/conda/envs/cuquantum-23.10/bin/cuquantum-benchmarks", line 5, in <module>
    from cuquantum_benchmarks.run import run
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 10, in <module>
    from .backends import backends
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/__init__.py", line 6, in <module>
    from .backend_cutn import cuTensorNet
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_cutn.py", line 11, in <module>
    import cupy as cp
  File "/home/cuquantum/conda/envs/cuquantum-23.10/lib/python3.10/site-packages/cupy/__init__.py", line 19, in <module>
    raise ImportError(f'''
ImportError: 
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Its quite strange because when I run:

pip install cupy-cuda11x

I get a message that the requirements are already satisfied.

I have only been able to run cuquantum-benchmarks commands on a HPC system with x86 architecture (one sentence in my original message was really unclear about this so I have edited it).

ashleytsmith commented 7 months ago

I realised you could have meant to try and run benchmarks inside a different version of the container. I tried to do the same workflow inside this container:

nvcr.io/nvidia/cuquantum-appliance:23.10-arm64

I can’t build the benchmark suite. When I run.

pip install .[all]

I get this error:

Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-slb1t_fg/pennylane-lightning-gpu_e30d32e262ec4b4694a7e3d08281e60f/setup.py", line 169, in <module>
          with open(os.path.join("pennylane_lightning", "core", "_version.py"), encoding="utf-8") as f:
      FileNotFoundError: [Errno 2] No such file or directory: 'pennylane_lightning/core/_version.py'
      [end of output]

tlubowe commented 7 months ago

Hi @ashleytsmith why do you need to run the benchmarks for Pennylane in any of the containers? Pennylane is not built in the containers. You should be able to build your own PL environment separate from the containers where you can define what you need. Is this not a viable solution?

ashleytsmith commented 5 months ago

Hi @tlubowe I wanted to run the cuquantum benchmarks suite inside the cuQuantum appliance container because quite a few of the benchmarks in the suite explicitly rely on things that can be tricky to install or are only available inside the container e.g. qsim-mgpu https://github.com/NVIDIA/cuQuantum/blob/main/benchmarks/cuquantum_benchmarks/run.py

When one clones the benchmark suite and runs the setup then pennylane is installed inside the conda environment within the container and appears when one does conda list. As I was building docker files for other software anyway I did attempt to build my own container and install your benchmark suite. It did not work, I can’t run anything from the benchmark suite there e.g. I can’t run

cuquantum-benchmarks circuit --frontend qiskit --backend aer --benchmark qaoa --nqubits 16

(This is the error when I try to run it on a linux HPC cluster)

File "/opt/conda/envs/baseenv/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_qiskit.py", line 47, in find_version
    if hasattr(qiskit_aer, "__version__"):
NameError: name 'qiskit_aer' is not defined

Here is the dockerfile.

FROM rockylinux:9.3

# Set environment variables
ENV MINICONDA_VERSION=py310_24.4.0-0
ENV PATH=/opt/conda/bin:$PATH
ENV PYTHON_VERSION=3.10

# Install necessary packages
RUN yum -y install wget bzip2 tar environment-modules git && \
    yum clean all

# Download and install Miniconda
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -O /tmp/miniconda.sh && \
    /bin/bash /tmp/miniconda.sh -b -p /opt/conda && \
    rm /tmp/miniconda.sh

# Initialize Conda in bash config
RUN echo "source /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
    echo "conda activate base" >> ~/.bashrc

# Create and activate a new conda enviroment
RUN conda create --name cuquant python=${PYTHON_VERSION}  && \
    echo "source /opt/conda/bin/activate cuquant >> ~/.bashrc

# Install packages
RUN git clone https://github.com/NVIDIA/cuQuantum.git \
&& cd cuQuantum/benchmarks \
&& /opt/conda/envs/cuquant/bin/pip install .[all]

# Set the default command to run when starting the container
CMD [ "/bin/bash" ]

I really would have liked to be able to run the pennylane benchmarks from your suite (also qulacs) but there is a lot more additional effort required compared to the qiskit and cirq ones.

NVIDIA / cuQuantum