Open LeSnow-Ye opened 1 month ago
Hi @LeSnow-Ye,
I'm sorry to hear that you're having issues with the build. It would be helpful to try and narrow down the cross section of the failure to one I can reproduce to try and figure it out.
For the openmp issue: can you try and run one of the cases from the examples
folder, preferably the same problem type as the one used in your config file, then can you also check if running a) with -nt 1 with openmp built runs and 2) the minimum number N for which -nt N fails. The issue appears to be in the hypre, but unfortunately from the docs that's a generic error code which isn't very helpful.
For the gpu build, I am not sure as I have not been able to build that myself. Building with GPUs is currently fairly arcane, and there's likely issues with that spack script (hence it not being uploaded to the main spack repo yet) as we have not finished testing it yet. From that error message, it would seem that the magma build is not being triggered. We currently don't have enough resource to get to further testing for GPUs, but given you are working on them we would very much appreciate any suggested fixes to that spack file you might suggest.
Hi @hughcars, Thanks for your reply.
The OpenMP issue seems to be narrowed down to Eigenmode Problems. The cavity example failed as well from -nt 1
, but other examples seem fine.
For the GPU build, I'd love to help. But currently I'm not so familiar with the build process. If I make any progress or find any potential fixes, I will try to submit a Pull Request with the updates. Your assistance in this area would be greatly appreciated.
Yesterday I made a manual openmp build to check that, and though I uncovered some issues (https://github.com/awslabs/palace/issues/279) during testing with apple mac m1, the cases did run without your error message however, and all examples ran perfectly with -nt 1
. When I get some more bandwidth I will try debugging the spack build, as my initial attempts have failed.
For your gpu question, have you tried +magma
?
Hi, @hughcars, Thanks for your reply.
Somehow when I use +cuda cuda_arch=<xx>
together with +magma
, conflicts are detected.
==> Error: concretization failed for the following reasons:
1. magma: conflicts with 'cuda_arch=86'
2. magma: conflicts with 'cuda_arch=86'
required because conflict constraint
required because palace depends on magma+cuda cuda_arch=86 when +cuda+magma cuda_arch=86
required because palace+cuda+magma+openmp cuda_arch=86 requested explicitly
required because palace depends on magma+shared when +magma+shared
required because conflict is triggered when cuda_arch=86
required because palace depends on magma+cuda cuda_arch=86 when +cuda+magma cuda_arch=86
required because palace+cuda+magma+openmp cuda_arch=86 requested explicitly
required because palace depends on magma+shared when +magma+shared
I think it might be unnecessary to do add +magma
when we have +cuda
according to the script palace/package.py ?
...
with when("+magma"):
depends_on("magma")
depends_on("magma+shared", when="+shared")
depends_on("magma~shared", when="~shared")
...
with when("+cuda"):
for arch in CudaPackage.cuda_arch_values:
cuda_variant = f"+cuda cuda_arch={arch}"
depends_on(f"hypre{cuda_variant}", when=f"{cuda_variant}")
depends_on(f"superlu-dist{cuda_variant}", when=f"+superlu-dist{cuda_variant}")
depends_on(f"strumpack{cuda_variant}", when=f"+strumpack{cuda_variant}")
depends_on(f"slepc{cuda_variant} ^petsc{cuda_variant}", when=f"+slepc{cuda_variant}")
depends_on(f"magma{cuda_variant}", when=f"+magma{cuda_variant}")
Maybe I should try building manually later.
Hi, @hughcars,
The CUDA problem is because MAGMA in Spack currently has poor support for many specific cuda_arch
values.
# Many cuda_arch values are not yet recognized by MAGMA's CMakeLists.txt
for target in [10, 11, 12, 13, 21, 32, 52, 53, 61, 62, 72, 86]:
conflicts("cuda_arch={}".format(target))
In the newly updated master branch of MAGMA, more valid architectures are acceptable. See icl / magma / CMakeLists.txt — Bitbucket.
So, by manually removing the needed cuda_arch
value from the list above and switching to magma@master
, my CUDA problem could be solved.
Hi, @hughcars,
The CUDA problem is because MAGMA in Spack currently has poor support for many specific
cuda_arch
values.# Many cuda_arch values are not yet recognized by MAGMA's CMakeLists.txt for target in [10, 11, 12, 13, 21, 32, 52, 53, 61, 62, 72, 86]: conflicts("cuda_arch={}".format(target))
In the newly updated master branch of MAGMA, more valid architectures are acceptable. See icl / magma / CMakeLists.txt — Bitbucket.
So, by manually removing the needed
cuda_arch
value from the list above and switching tomagma@master
, my CUDA problem could be solved.
Ooof, building with gpus is very fiddly in our experience so far. I'm very glad you managed to get to the root cause!
I'm trying to build Palace with GPU (CUDA) and OpenMP support using Spack.
The package file is same as palace/spack/local/packages/palace/package.py at main · awslabs/palace.
My installation command is
spack install palace +cuda cuda_arch=86 +openmp
spack spec
result: palace-spec.txtProblem with OpenMP
After changing command
palace -np 64 2DQv9_eb4_3d_resonator_eigen.json -launcher-args "--use-hwthread-cpus"
to commandpalace -nt 64 2DQv9_eb4_2d_resonator_eigen.json
, the following error occurs:The same configuration file works fine under Palace@0.13.0 with default setup. (No OpenMP and CUDA)
Problem with GPU
When setting ["Solver"]["Device"] = "GPU", the following error occurs
Environment
Compiler: palace-spec.txt
Is it an issue with Spack package file or with my local environment? Could you please suggest a solution for this issue? Thanks!