Open mredenti opened 1 week ago
Hi,
Thank you for this very detailed issue.
The problem is that tandem require you using:
--with-memalign=32 --with-64-bit-indices
see. e.g.:
https://tandem.readthedocs.io/en/latest/getting-started/installation.html#install-petsc
(but to be honnest, I tried to to install with my usual spack workflow, and get:
Cannot use scalapack with 64-bit BLAS/LAPACK indices
)
which is strange because I'm not getting this problem in our local cluster.
(and I also tried starting for the lastest spack).
Description
dmay/petsc_dev_hip
(fixes residual convergence difference between CPU and GPU)1015c31d0f29eab4983497a3ad3f607057285388
static
I'm encountering errors when running the Tandem mini-app static on the Leonardo Booster HPC system. Specifically,
yateto kernels
test fails during execution.cuSPARSE_STATUS_INSUFFICIENT_RESOURCES
error occurs when launching the mini-app on less than (~) 48 nodes with 4 gpus per node (4*64*48 GB). I am not sure whether it is simply that the problem size is too large or something else.Problem setup
Get audit scenario
Create intermediate size mesh with gmsh (same setup as Eviden-WP3)
Change mesh in ridge.toml
Steps to reproduce errors
Attempt 1: Use system installation of Petsc@3.20.1
Click to expand
**Load Modules** ```shell module purge module load petsc/3.20.1--openmpi--4.1.6--gcc--12.2.0-cuda-12.1-mumps # <---petsc module load cuda/12.1 module load eigen/3.4.0--gcc--12.2.0-5jcagas module load spack/0.21.0-68a module load cmake/3.27.7 ``` **Spack environment for Lua and Python+Numpy dependencies** ```shell spack create -d ./spack-env-tandem spack env activate ./spack-env-tandem -p spack add py-numpy lua@5.4.4 spack concretize -f spack install ``` **Install CSV module** ```shell luarocks install csv ``` **Clone Tandem** ```shell git clone -b dmay/petsc_dev_hip https://github.com/TEAR-ERC/tandem.git tandem-petsc_dev_hip cd tandem-petsc_dev_hip && git submodule update --init cd .. ``` **Build Tandem** Note: Petsc on Leonardo has been installed without a specific value for `--with-memalign`. When running the CMake configuration step ```shell cmake -B ./build -S ./tandem-petsc_dev_hip -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DPOLYNOMIAL_DEGREE=4 -DDOMAIN_DIMENSION=3 ``` I get the following error ```shell -- Could NOT find LibxsmmGenerator (missing: LibxsmmGeneratorExecutable) CMake Error at app/CMakeLists.txt:72 (message): The memory alignment of PETSc is 16 bytes but an alignment of at least 32 bytes is required for ARCH=hsw. Please compile PETSc with --with-memalign=32. ``` and so I temporarily commented out Tandem's requirement on memory alignment for Petsc in `app/CMakeLists.txt`(just to verify whether I got the same error as for the custom installation of Petsc) ```CMakeLists.txt #[=[ if(PETSC_MEMALIGN LESS ALIGNMENT) message(SEND_ERROR "The memory alignment of PETSc is ${PETSC_MEMALIGN} bytes but an alignment of " "at least ${ALIGNMENT} bytes is required for ARCH=${ARCH}. " "Please compile PETSc with --with-memalign=${ALIGNMENT}.") endif() #]=] ``` and then I build and run the tests on a login node ```shell cmake --build ./build --parallel 4 ctest --test-dir ./build ``` where the `yateto kernels` failed ```shell ctest --test-dir ./build --rerun-failed ``` ```shell Start testing: Oct 13 11:39 CEST ---------------------------------------------------------- 3/21 Testing: yateto kernels 3/21 Test: yateto kernels Command: "/leonardo_work/cin_staff/mredenti/ChEESE/TANDEM/build/app/test-elasticity-kernel" "--test-case=yateto kernels" Directory: /leonardo_work/cin_staff/mredenti/ChEESE/TANDEM/build/app "yateto kernels" start time: Oct 13 11:39 CEST Output: ---------------------------------------------------------- [doctest] doctest version is "2.3.7" [doctest] run with "--help" for options =============================================================================== /leonardo_work/cin_staff/mredenti/ChEESE/TANDEM/build/app/kernels/elasticity/test-kernel.cpp:10: TEST CASE: yateto kernels apply_inverse_mass /leonardo_work/cin_staff/mredenti/ChEESE/TANDEM/build/app/kernels/elasticity/test-kernel.cpp:4938: ERROR: CHECK( sqrt(error/refNorm) < 2.22e-14 ) is NOT correct! values: CHECK( 0.0 < 0.0 ) =============================================================================== [doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped [doctest] assertions: 65 | 64 passed | 1 failed | [doctest] Status: FAILURE!Attempt 2: Instal Petsc@3.21.5 from source
Note: Even when I install Petsc from source the resulting outcome is not any different from the errors documented in attempt one, and therefore I will only report the installation steps of Petsc
Click to expand
**Set Petsc Version** ```shell export PETSC_VERSION=3.21.5 ``` **Clone Petsc** ```shell git clone -b v$PETSC_VERSION https://gitlab.com/petsc/petsc.git petsc-$PETSC_VERSION ``` **Petsc Installation** ```shell #!/bin/bash #SBATCH -A