PennyLaneAI / pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
https://docs.pennylane.ai/projects/lightning
Apache License 2.0
83 stars 35 forks source link

“Error in PennyLane Lightning: custatevec dynamic library load failure” #875

Open Shikairan opened 2 weeks ago

Shikairan commented 2 weeks ago

I can't pass the mpitests with cmd "mpirun -np 2 -env UCX_NET_DEVICES=eth0 python -m pytest mpitests --tb=short" and cmd "mpirun -np 2 python -m pytest mpitests --tb=short".

The pytest return the error: "pennylane_lightning.lightning_gpu_ops.LightningException: [/home/pl/pl5/pennylane-lightning-master/pennylane_lightning/core/src/simulators/lightning_gpu/MPIWorker.hpp][Line:178][Method:make_shared_mpi_worker]: Error in PennyLane Lightning: custatevec dynamic library load failure".

I compile mpi, ucx and lightning.gpu with mpi in the docker image <nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04>(IMAGE ID : bc9059f96b2a).

1: compile the mpich-4.2.2 with source, use cmd: ./configure --prefix=/my/path --with-device=ch4:ucx --with-cuda=/my/cuda/path I can pass the example in the mpi package, include the , , <cuda/cudapi test>

2: compile the ucx-1.7.0 with source, use cmd: ../configure --prefix=/my/own/path It can pass test by using cmd: "mpirun -np 2 -env UCX_NET_DEVICES=eth0 ./cuda/cudapi" in mpi examples-test.

So, it seems like the base enviroment can work.

Then I follow the steps in the pennylane-lightning to install lightning.gpu with mpi. 1: try to pip the requirement.txt and requirement-dev.txt in different conda enviromnet. I try the two requirement both. 2: follow the steps in the Lightning-GPU installation Then I can't pass the pytest of mpi-test. The error detail is above.

If i use pip to install lightning.gpu (without mpi, only gpu vision), I can pass the pytest in tests. So the custatevec can work in plan.

The log of installing: mpilightning.gpu.install.log

alister commented 2 weeks ago

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

Shikairan commented 2 weeks ago

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

Thank you!

maliasadi commented 2 weeks ago

Hi @Shikairan, thank you for reporting this! Lightning-GPU is bounded with the system support of the NVIDIA cuQuantum libraries and cuStateVec supports CUDA capable GPU of generation SM 7.0 (Volta) and greater. Can you try compiling Lightning-GPU + MPI on NVIDIA GPUs with compute capability 7.0+? You may want to use CMAKE_CUDA_ARCHITECTURES to specify the CUDA architecture at compile time.

Shikairan commented 2 weeks ago

CMAKE_CUDA_ARCHITECTURES

I test this docker image and compile lightning.gpu on 4090/4080/3090TI/A800/A100, the all of those GPU cant help to pass the mpitest.

maliasadi commented 2 weeks ago

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the --config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

Shikairan commented 2 weeks ago

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the --config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

I tried to compile this project since last month, both compile cmds had been tried, but still failed. Both of them return the same error which I mentioned above. I will tried to compile again to collect all the logs in the docker, all the logs will be upload next week, the project will be compiled on a machine with two 3090TI.

maliasadi commented 5 days ago

Hey @Shikairan, I'm just following up on this issue. Were you able to compile and test Lightning-GPU with MPI?