awslabs / palace

3D finite element solver for computational electromagnetics
https://awslabs.github.io/palace/dev
Apache License 2.0
224 stars 50 forks source link

Add CUDA/HIP support and GPU builds #184

Closed sebastiangrimberg closed 4 months ago

sebastiangrimberg commented 5 months ago

Adds support for CUDA and HIP in the build system and makes corresponding changes to the code to run on GPUs.

Accompanying documentation update in #185

Resolves https://github.com/awslabs/palace/issues/3

TODO (for Spack support):

sebastiangrimberg commented 4 months ago

The actual issues seen so far:

  • Running the examples tests on a g5.48xlarge, the spheres case is failing with nans for the indicator norm.

This is fixed in a8d0cc9.

  • Running the examples with nt > 1, SuperLU seems to cause
  Residual norms for GMRES solve
 ** On entry to DGEMM  parameter number  8 had an illegal value
--------------------------------------------------------------------------
prterun noticed that process rank 1 with PID 10743 on node c889f3baddb0 exited on
signal 11 (Segmentation fault: 11).
--------------------------------------------------------------------------

on the cpw_lumped_uniform case, whereas using STRUMPACK is fine.

As far as I can tell, this is only happening on M1 macOS with SuperLU_DIST. I'm strongly inclined to say it is a bug there but at least for now it is not related to this PR. One thing to probably explore is for OpenMP builds to build SuperLU_DIST and STRUMPACK without OpenMP and just rely on a threaded BLAS/LAPACK.

EDIT: SuperLU_DIST issue: https://github.com/xiaoyeli/superlu_dist/issues/159

sebastiangrimberg commented 4 months ago

Once this PR, https://github.com/awslabs/palace/pull/193, and https://github.com/awslabs/palace/pull/194 are approved, I will rebase on main and merge all into main together.