Building with Spack with CUDA and OpenMP

LeSnow-Ye commented 1 month ago

I'm trying to build Palace with GPU (CUDA) and OpenMP support using Spack.

The package file is same as palace/spack/local/packages/palace/package.py at main · awslabs/palace.

My installation command is spack install palace +cuda cuda_arch=86 +openmp

spack spec result: palace-spec.txt

Problem with OpenMP

After changing command palace -np 64 2DQv9_eb4_3d_resonator_eigen.json -launcher-args "--use-hwthread-cpus" to command palace -nt 64 2DQv9_eb4_2d_resonator_eigen.json, the following error occurs:

...

Git changeset ID: d03e1d9                                                                                                                                                                        Running with 1 MPI process, 64 OpenMP threads                                                                                                                                                    Detected 1 CUDA device
Device configuration: omp,cpu                                                                                                                                                                    Memory configuration: host-std
libCEED backend: /cpu/self/xsmm/blocked

...

Configuring SLEPc eigenvalue solver:
 Scaling γ = 6.087e+02, δ = 7.724e-06
 Configuring divergence-free projection
 Using random starting vector

Verification failed: (!err_flag) is false:
 --> Error during setup! Error code: 1
 ... in function: virtual void mfem::HypreSolver::Setup(const mfem::HypreParVector&, mfem::HypreParVector&) const
 ... in file: /tmp/lesnow/spack-stage/spack-stage-palace-develop-pkce5vp2bxzmswrs324vma4hf56do3ip/spack-build-pkce5vp/extern/mfem/linalg/hypre.cpp:4038

Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[WARNING] yaksa: 9 leaked handle pool objects

The same configuration file works fine under Palace@0.13.0 with default setup. (No OpenMP and CUDA)

Problem with GPU

When setting ["Solver"]["Device"] = "GPU", the following error occurs

spack-build-pkce5vp/extern/libCEED/backends/ceed-backend-weak.c:15 in CeedInit_Weak(): Backend not currently compiled: /gpu/cuda/magma
Consult the installation instructions to compile this backend

LIBXSMM_VERSION: feature_int4_gemms_scf_zpt_MxK-1.17-3727 (25693839)
LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz]
Registry and code: 13 MB
Command: /home/lesnow/spack/opt/spack/linux-ubuntu20.04-cascadelake/gcc-9.4.0/palace-develop-pkce5vp2bxzmswrs324vma4hf56do3ip/bin/palace-x86_64.bin 2DQv9_eb4_2d_resonator_eigen_gpu.json
Uptime: 1.496896 s

Environment

Linux amax 5.15.0-91-generic #101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 30%   34C    P8    21W / 220W |    382MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1929      G   /usr/lib/xorg/Xorg                 53MiB |
|    0   N/A  N/A      3658      G   /usr/lib/xorg/Xorg                167MiB |
|    0   N/A  N/A      3885      G   /usr/bin/gnome-shell               62MiB |
|    0   N/A  N/A      4195      G   ...bexec/gnome-initial-setup        3MiB |
|    0   N/A  N/A      4224      G   ...2gtk-4.0/WebKitWebProcess       20MiB |
+-----------------------------------------------------------------------------+

Compiler: palace-spec.txt

Is it an issue with Spack package file or with my local environment? Could you please suggest a solution for this issue? Thanks!

hughcars commented 1 month ago

Hi @LeSnow-Ye,

I'm sorry to hear that you're having issues with the build. It would be helpful to try and narrow down the cross section of the failure to one I can reproduce to try and figure it out.

For the openmp issue: can you try and run one of the cases from the examples folder, preferably the same problem type as the one used in your config file, then can you also check if running a) with -nt 1 with openmp built runs and 2) the minimum number N for which -nt N fails. The issue appears to be in the hypre, but unfortunately from the docs that's a generic error code which isn't very helpful.

For the gpu build, I am not sure as I have not been able to build that myself. Building with GPUs is currently fairly arcane, and there's likely issues with that spack script (hence it not being uploaded to the main spack repo yet) as we have not finished testing it yet. From that error message, it would seem that the magma build is not being triggered. We currently don't have enough resource to get to further testing for GPUs, but given you are working on them we would very much appreciate any suggested fixes to that spack file you might suggest.

LeSnow-Ye commented 1 month ago

Hi @hughcars, Thanks for your reply.

The OpenMP issue seems to be narrowed down to Eigenmode Problems. The cavity example failed as well from -nt 1, but other examples seem fine.

For the GPU build, I'd love to help. But currently I'm not so familiar with the build process. If I make any progress or find any potential fixes, I will try to submit a Pull Request with the updates. Your assistance in this area would be greatly appreciated.

hughcars commented 1 month ago

Yesterday I made a manual openmp build to check that, and though I uncovered some issues (https://github.com/awslabs/palace/issues/279) during testing with apple mac m1, the cases did run without your error message however, and all examples ran perfectly with -nt 1. When I get some more bandwidth I will try debugging the spack build, as my initial attempts have failed.

For your gpu question, have you tried +magma?

LeSnow-Ye commented 1 month ago

Hi, @hughcars, Thanks for your reply.

Somehow when I use +cuda cuda_arch=<xx> together with +magma, conflicts are detected.

==> Error: concretization failed for the following reasons: 

   1. magma: conflicts with 'cuda_arch=86'
   2. magma: conflicts with 'cuda_arch=86'
        required because conflict constraint
          required because palace depends on magma+cuda cuda_arch=86 when +cuda+magma cuda_arch=86
            required because palace+cuda+magma+openmp cuda_arch=86 requested explicitly
          required because palace depends on magma+shared when +magma+shared
        required because conflict is triggered when cuda_arch=86
          required because palace depends on magma+cuda cuda_arch=86 when +cuda+magma cuda_arch=86
            required because palace+cuda+magma+openmp cuda_arch=86 requested explicitly
          required because palace depends on magma+shared when +magma+shared

I think it might be unnecessary to do add +magma when we have +cuda according to the script palace/package.py ?

...

    with when("+magma"):
        depends_on("magma")
        depends_on("magma+shared", when="+shared")
        depends_on("magma~shared", when="~shared")

...

    with when("+cuda"):
        for arch in CudaPackage.cuda_arch_values:
            cuda_variant = f"+cuda cuda_arch={arch}"
            depends_on(f"hypre{cuda_variant}", when=f"{cuda_variant}")
            depends_on(f"superlu-dist{cuda_variant}", when=f"+superlu-dist{cuda_variant}")
            depends_on(f"strumpack{cuda_variant}", when=f"+strumpack{cuda_variant}")
            depends_on(f"slepc{cuda_variant} ^petsc{cuda_variant}", when=f"+slepc{cuda_variant}")
            depends_on(f"magma{cuda_variant}", when=f"+magma{cuda_variant}")

Maybe I should try building manually later.

LeSnow-Ye commented 1 month ago

Hi, @hughcars,

The CUDA problem is because MAGMA in Spack currently has poor support for many specific cuda_arch values.

magma/package.py#L85

    # Many cuda_arch values are not yet recognized by MAGMA's CMakeLists.txt
    for target in [10, 11, 12, 13, 21, 32, 52, 53, 61, 62, 72, 86]:
        conflicts("cuda_arch={}".format(target))

In the newly updated master branch of MAGMA, more valid architectures are acceptable. See icl / magma / CMakeLists.txt — Bitbucket.

So, by manually removing the needed cuda_arch value from the list above and switching to magma@master, my CUDA problem could be solved.

hughcars commented 1 month ago

Hi, @hughcars,

The CUDA problem is because MAGMA in Spack currently has poor support for many specific cuda_arch values.

magma/package.py#L85
    # Many cuda_arch values are not yet recognized by MAGMA's CMakeLists.txt
    for target in [10, 11, 12, 13, 21, 32, 52, 53, 61, 62, 72, 86]:
        conflicts("cuda_arch={}".format(target))
In the newly updated master branch of MAGMA, more valid architectures are acceptable. See icl / magma / CMakeLists.txt — Bitbucket.

So, by manually removing the needed cuda_arch value from the list above and switching to magma@master, my CUDA problem could be solved.

Ooof, building with gpus is very fiddly in our experience so far. I'm very glad you managed to get to the root cause!

awslabs / palace