NVIDIA / cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
https://docs.nvidia.com/cuda/cuquantum/
BSD 3-Clause "New" or "Revised" License
320 stars 63 forks source link

Calling `cutn.distributed_reset_configuration()` with MPICH might fail with `CUTENSORNET_STATUS_DISTRIBUTED_FAILURE` #31

Closed leofang closed 1 year ago

leofang commented 1 year ago

MPICH users running this sample might see the following error:

$ mpiexec -n 2 python example22_mpi_auto.py
Traceback (most recent call last):
  File "/home/leof/dev/cuquantum/python/samples/cutensornet/coarse/example22_mpi_auto.py", line 60, in <module>
    cutn.distributed_reset_configuration(
  File "cuquantum/cutensornet/cutensornet.pyx", line 2306, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 2328, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 229, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_DISTRIBUTED_FAILURE
Traceback (most recent call last):
  File "/home/leof/dev/cuquantum/python/samples/cutensornet/coarse/example22_mpi_auto.py", line 60, in <module>
    cutn.distributed_reset_configuration(
  File "cuquantum/cutensornet/cutensornet.pyx", line 2306, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 2328, in cuquantum.cutensornet.cutensornet.distributed_reset_configuration
  File "cuquantum/cutensornet/cutensornet.pyx", line 229, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_DISTRIBUTED_FAILURE

This is a known issue for the automatic MPI support using cuQuantum Python 22.11 / cuTensorNet 2.0.0 + mpi4py + MPICH.

The reason is that Python by default dynamically loads shared libraries in the private mode (see, e.g., the documentation for ctypes.DEFAULT_MODE)), which breaks the assumption of libcutensornet_distributed_interface_mpi.so (whose path is set via $CUTENSORNET_COMM_LIB) that MPI symbols would be loaded to the public scope.

Open MPI is immune to this problem because mpi4py had to "break" this assumption due to a few old Open MPI issues.

There are multiple workarounds that users can choose:

  1. Load the MPI symbols via LD_PRELOAD, e.g., mpiexec -n 2 -env LD_PRELOAD=$MPI_HOME/lib/libmpi.so python example22_mpi_auto.py
  2. Change Python's default loading mode to public (global) before any other imports
    import os, sys
    sys.setdlopenflags(os.RTLD_LAZY | os.RTLD_GLOBAL)
    import ...
  3. If compiling libcutensornet_distributed_interface_mpi.so manually, link the MPI library to it via -lmpi

In a future release, we will add a fix to work around this limitation. See also https://github.com/NVIDIA/cuQuantum/discussions/30 for discussion.

leofang commented 1 year ago

Internal ticket: CUQNT-1594.

leofang commented 1 year ago

This is fixed in cuQuantum 23.03. We now ask users to link libcutensornet_distributed_interface_mpi.so to MPI by passing -lmpi to the compiler/linker, if users are building it themselves.

The cuTensorNet-MPI wrapper library (libcutensornet_distributed_interface_mpi.so) needs to be linked to the MPI library libmpi.so. If you use our conda-forge packages or cuQuantum Appliance container, or compile your own using the provided activate_mpi.sh script, this is taken care for you.

https://docs.nvidia.com/cuda/cuquantum/cutensornet/release_notes.html#cutensornet-v2-1-0

yapolyak commented 7 months ago

Hi @leofang sorry to re-open this, but after a while I tried automatic contraction with cuQuantum 23.06 (basically this script https://github.com/NVIDIA/cuQuantum/blob/main/python/samples/cutensornet/coarse/example22_mpi_auto.py) on Perlmutter, using its MPICH, and I again got the CUTENSORNET_STATUS_DISTRIBUTED_FAILURE error.

I do build cuQuantum from conda-forge, so according to you the linking to MPI should be sorted... however I fetch only an "external" placeholder mpich from conda-forge and then build mpi4py locally - could it be that due to that the libcutensornet_distributed_interface_mpi.so is not linked properly?

yapolyak commented 7 months ago

Ah there we go:

ldd ~/.conda/envs/py-cuquantum-23.06.0-mypich-py3.9/lib/libcutensornet_distributed_interface_mpi.so
    linux-vdso.so.1 (0x00007fffc8de5000)
    libmpi.so.12 => not found
    libc.so.6 => /lib64/libc.so.6 (0x00007ff8c1f36000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff8c2157000)

Let me try to link it manually if I can...

yapolyak commented 7 months ago

Done - I added the MPICH's /lib-abi-mpich path to $LD_LIBRARY_PATH, relinked and it works now! Sorry for the noise :)