PennyLaneAI / pennylane-lightning

The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
https://docs.pennylane.ai/projects/lightning
Apache License 2.0
95 stars 40 forks source link

“Error in PennyLane Lightning: custatevec dynamic library load failure” #875

Open Shikairan opened 3 months ago

Shikairan commented 3 months ago

I can't pass the mpitests with cmd "mpirun -np 2 -env UCX_NET_DEVICES=eth0 python -m pytest mpitests --tb=short" and cmd "mpirun -np 2 python -m pytest mpitests --tb=short".

The pytest return the error: "pennylane_lightning.lightning_gpu_ops.LightningException: [/home/pl/pl5/pennylane-lightning-master/pennylane_lightning/core/src/simulators/lightning_gpu/MPIWorker.hpp][Line:178][Method:make_shared_mpi_worker]: Error in PennyLane Lightning: custatevec dynamic library load failure".

I compile mpi, ucx and lightning.gpu with mpi in the docker image <nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04>(IMAGE ID : bc9059f96b2a).

1: compile the mpich-4.2.2 with source, use cmd: ./configure --prefix=/my/path --with-device=ch4:ucx --with-cuda=/my/cuda/path I can pass the example in the mpi package, include the , , <cuda/cudapi test>

2: compile the ucx-1.7.0 with source, use cmd: ../configure --prefix=/my/own/path It can pass test by using cmd: "mpirun -np 2 -env UCX_NET_DEVICES=eth0 ./cuda/cudapi" in mpi examples-test.

So, it seems like the base enviroment can work.

Then I follow the steps in the pennylane-lightning to install lightning.gpu with mpi. 1: try to pip the requirement.txt and requirement-dev.txt in different conda enviromnet. I try the two requirement both. 2: follow the steps in the Lightning-GPU installation Then I can't pass the pytest of mpi-test. The error detail is above.

If i use pip to install lightning.gpu (without mpi, only gpu vision), I can pass the pytest in tests. So the custatevec can work in plan.

The log of installing: mpilightning.gpu.install.log

alister commented 3 months ago

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

Shikairan commented 3 months ago

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

Thank you!

maliasadi commented 3 months ago

Hi @Shikairan, thank you for reporting this! Lightning-GPU is bounded with the system support of the NVIDIA cuQuantum libraries and cuStateVec supports CUDA capable GPU of generation SM 7.0 (Volta) and greater. Can you try compiling Lightning-GPU + MPI on NVIDIA GPUs with compute capability 7.0+? You may want to use CMAKE_CUDA_ARCHITECTURES to specify the CUDA architecture at compile time.

Shikairan commented 3 months ago

CMAKE_CUDA_ARCHITECTURES

I test this docker image and compile lightning.gpu on 4090/4080/3090TI/A800/A100, the all of those GPU cant help to pass the mpitest.

maliasadi commented 3 months ago

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the --config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

Shikairan commented 3 months ago

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the --config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

I tried to compile this project since last month, both compile cmds had been tried, but still failed. Both of them return the same error which I mentioned above. I will tried to compile again to collect all the logs in the docker, all the logs will be upload next week, the project will be compiled on a machine with two 3090TI.

maliasadi commented 2 months ago

Hey @Shikairan, I'm just following up on this issue. Were you able to compile and test Lightning-GPU with MPI?

Shikairan commented 2 months ago

Hey @Shikairan, I'm just following up on this issue. Were you able to compile and test Lightning-GPU with MPI?

Here is the latest log: base env.txt penny-lightning install log.txt

split each step by string "================================================================"

kevzos commented 3 weeks ago

I encounter same issue。

mpirun -np 2 python -m pytest mpitests --tb=short -x
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.3.3, pluggy-1.5.0
rootdir: /data/whc/pennylane-lightning
configfile: pyproject.toml
plugins: flaky-3.8.1, xdist-3.6.1, mock-3.14.0, cov-6.0.0
collected 3736 items

mpitests/test_adjoint_jacobian.py ============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.3.3, pluggy-1.5.0
rootdir: /data/whc/pennylane-lightning
configfile: pyproject.toml
plugins: flaky-3.8.1, xdist-3.6.1, mock-3.14.0, cov-6.0.0
collected 3736 items

mpitests/test_adjoint_jacobian.py EE

==================================== ERRORS ====================================
_______ ERROR at setup of TestAdjointJacobian.test_not_expval[dev0-True] _______
mpitests/test_adjoint_jacobian.py:51: in fixture_dev
    return qml.device(
/root/anaconda3/envs/mpi310/lib/python3.10/site-packages/pennylane/devices/device_constructor.py:280: in device
    dev = plugin_device_class(*args, **options)
pennylane_lightning/lightning_gpu/lightning_gpu.py:354: in __init__
    self._statevector = self.LightningStateVector(
pennylane_lightning/lightning_gpu/_state_vector.py:101: in __init__
    self._qubit_state = self._state_dtype()(
E   pennylane_lightning.lightning_gpu_ops.LightningException: [/data/whc/pennylane-lightning/pennylane_lightning/core/src/simulators/lightning_gpu/MPIWorker.hpp][Line:178][Method:make_shared_mpi_worker]: Error in PennyLane Lightning: custatevec dynamic library load failure

I follow instruction in https://docs.pennylane.ai/projects/lightning/en/stable/lightning_gpu/installation.html#id1 to compile from source ,run testcase on centos with 2gpu of 3090ti ,cuda==12.1.

kevzos commented 3 weeks ago

Tested on cuquatum container,use pip install,got issue: File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py", line 297, in _mpi_init_helper raise ImportError("MPI related APIs are not found.") ImportError: MPI related APIs are not found.

multiphaseCFD commented 2 weeks ago

Hey @kevzos and @Shikairan ,

Thanks for your interests in the distributed Lightning.GPU and reporting the issue.

Would you please help to check if adding path\to\libmpi.so to the LD_LIBRARY_PATH env work?

Please feel free to reach out if there is any issue.

Thanks,