Open hpc4geo opened 1 year ago
I've asked the LLM Qwen2, and got some hints to track down the problems, which might make sense:
The differences in the convergence history between the CPU and GPU runs could be attributed to several factors, particularly those related to the linear solvers and preconditioners used in your PETSc configuration. Here are some potential causes and suggestions to troubleshoot and potentially resolve the issue:
To address these issues, consider the following steps:
By systematically addressing these potential causes, you should be able to identify and mitigate the differences in the convergence history between the CPU and GPU runs.
These steps seem reasonable to me, did you have a chance to try any of them?
Actually- some of these aspects seem to have been discussed in previous tandem meetings. I am not sure an LLM will be a better place to bounce off ideas from than discussing with the team
@hpc4geo folllowing discussion on slack: CPU run cpu_log.txt GPU run gpu_log.txt
(base) ulrich@cachemiss:/export/dump/ulrich/section_7_1/2d$ cat ../../section_8_2/scenario-rc/option3.cfg
-pc_type mg
-ksp_type fgmres
-ksp_rtol 1.0e-9
-mg_levels_ksp_max_it 10
-mg_levels_ksp_chebyshev_esteig 0,0.01,0,1.1
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-mg_coarse_ksp_type cg
-mg_coarse_pc_type jacobi
-mg_coarse_ksp_rtol 1.0e-1
-mg_coarse_ksp_max_it 10
# Turn on monitors for debugging
-ksp_monitor_true_residual
-mg_levels_ksp_monitor
-mg_coarse_ksp_monitor_short
# Force early termination for initial debugging
-ksp_max_it 10
# Report setup
-ksp_view
-log_summary
-options_left
(base) ulrich@cachemiss:/export/dump/ulrich/section_7_1/2d$ cat ../../section_8_2/scenario-rc/cuda.cfg
-mg_levels_mat_type aijcusparse
-vec_type cuda
-mat_type aijcusparse
-mg_coarse_mat_type aijcusparse
starting from 1 KSP, CPU and GPU differ Main difference:
26,27c26,27
< type: seqaij
< rows=92160, cols=92160, bs=6
---
> type: seqaijcusparse
> rows=92160, cols=92160
30c30
< using I-node routines: found 30720 nodes, limit used is 5
---
> not using I-node routines
59c59
The GPU version doesn't use I-node optimizations (which are typically more beneficial for CPU computations ?).
Tried -mat_block_size 6
for the gpu code: no change on the line starting with 1 KSP unpreconditioned
.
Tried -mat_no_inode
for the CPU code: no change on the same line.
@Thomas-Ulrich Okay. One monitor isn't activated. Let's turn it on. Below are a modified set of options. Please use these new ones.
-pc_type mg
-ksp_type fgmres
-ksp_rtol 1.0e-9
-mg_levels_ksp_max_it 10
-mg_levels_ksp_chebyshev_esteig 0,0.01,0,1.1
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
# Turn the norm type
-mg_levels_ksp_norm_type preconditioned
-mg_coarse_ksp_type cg
-mg_coarse_pc_type jacobi
-mg_coarse_ksp_rtol 1.0e-1
-mg_coarse_ksp_max_it 10
# Turn on monitors for debugging
-ksp_monitor_true_residual
-mg_levels_1_esteig_ksp_monitor_true_residual
-mg_levels_ksp_monitor
-mg_coarse_ksp_monitor_short
# Force early termination for initial debugging
-ksp_max_it 10
# Report setup
-ksp_view
-log_view
-options_left
Tsst 1: Run with above options (CPU) Test 2: Run with the above plus (GPU)
-vec_type cuda
-mat_type aijcusparse
@hpc4geo gpu_log2.txt cpu_log2.txt
Okay - I start to see the big picture of what is going wrong. The problem is a bad interaction between the smoother and the coarse solver. Here are two supporting observations
Residual norms for mg_levels_1_ solve.
0 KSP Residual norm 1.710018445740e-02
1 KSP Residual norm 7.417800265937e-03
...
9 KSP Residual norm 5.105136915222e-03
10 KSP Residual norm 3.987372338677e-03 # <smoother-down last residual>
Residual norms for mg_coarse_ solve.
0 KSP Residual norm 0.00232386
1 KSP Residual norm 0.00168699
2 KSP Residual norm 0.000824114
3 KSP Residual norm 0.000371013
4 KSP Residual norm 0.000222895
Residual norms for mg_levels_1_ solve.
0 KSP Residual norm 3.987372338677e-03 # Residual here is identical to <smoother-down last residual>
1 KSP Residual norm 1.577556700184e-03
2 KSP Residual norm 2.624939273350e-03
0 KSP unpreconditioned resid norm 7.740999135093e+02 true resid norm 7.740999135093e+02 ||r(i)||/||b|| 1.000000000000e+00
...
Residual norms for mg_levels_1_ solve.
0 KSP Residual norm 1.710018445740e-02
...
10 KSP Residual norm 3.987372338677e-03
Residual norms for mg_coarse_ solve.
0 KSP Residual norm 0.00232386 # <coarse solver residuals identical at each outer iteration>
1 KSP Residual norm 0.00168699
2 KSP Residual norm 0.000824114
3 KSP Residual norm 0.000371013
4 KSP Residual norm 0.000222895
...
9 KSP unpreconditioned resid norm 3.076674304842e-01 true resid norm 3.076674304842e-01 ||r(i)||/||b|| 3.974518342076e-04
...
Residual norms for mg_levels_1_ solve.
0 KSP Residual norm 2.354684270896e-02
...
10 KSP Residual norm 1.603202092982e-02
Residual norms for mg_coarse_ solve.
0 KSP Residual norm 0.00232386 # <coarse solver residuals identical at each outer iteration>
1 KSP Residual norm 0.00168699
2 KSP Residual norm 0.000824114
3 KSP Residual norm 0.000371013
4 KSP Residual norm 0.000222895
The last test to try for the moment (however I suspect the outcome will be the same) is using the following options
-pc_type mg
-ksp_type fgmres
-ksp_rtol 1.0e-9
-mg_levels_ksp_max_it 10
-mg_levels_ksp_richardson_scale 0.3
-mg_levels_ksp_type richardson
-mg_levels_pc_type none
# Set the norm type so monitor will report residuals
-mg_levels_ksp_norm_type preconditioned
-mg_coarse_ksp_type richardson
-mg_coarse_ksp_richardson_scale 0.3
-mg_coarse_pc_type none
-mg_coarse_ksp_rtol 0.5
-mg_coarse_ksp_max_it 10
-mg_coarse_ksp_converged_reason
# Turn on monitors for debugging
-ksp_monitor_true_residual
-mg_levels_ksp_monitor
-mg_coarse_ksp_monitor_true_residual
# Force early termination for initial debugging
-ksp_max_it 10
# Report setup
-ksp_view
-log_view
-options_left
Tsst 3: Run with above options (CPU) Test 4: Run with the above plus (GPU)
-vec_type cuda
-mat_type aijcusparse
here you go: cpu_log3.txt gpu_log3.txt
Thanks. Looks like I messed up. Could you please re run using both instances of richardson_scale set to be 1.0e-2 (0.3 is too high and is unstable apparently) (edited) 6:11 What’s weird is the GPU run explodes whilst the cpu run is fine
here are the new logs: cpu_log4.txt gpu_log4.txt
Results from a HIP build on LUMI-G indicate there is no problem at all.
CPU run
srun ./app/static cosine.toml --mg_strategy twolevel --mg_coarse_level 1 --petsc -device_view -ksp_view -ksp_monitor -options_file mg.opts > cpu.out
GPU run
srun ./app/static cosine.toml --mg_strategy twolevel --mg_coarse_level 1 --petsc -device_view -ksp_view -ksp_monitor -options_file mg.opts -vec_type hip -mat_type aijhipsparse > gpu.out
Comparing the convergence behavior they are identical.
Output files are attached (along with the options file)
One conclusion might be that the spack build @Thomas-Ulrich is running on the test machine is broken.
Hi Dave, That a great result! Maybe it works with aijhipsparse but is buggy with seqaijcusparse? let's see if it works for me on lumi with spack (actually the spack configuration on lumi is confusing, I m still struggling to install several tandem version that does not resolve to the same folder, but this should not prevent me for installing one, and testing it). I will do that after the field trip. Did you have to modify tandem for having it run with hip? (probably same kind of modif as with cuda, right?
So I changed tandem as follow for hip: https://github.com/TEAR-ERC/tandem/tree/thomas/hip
I compiled on Lumi with spack.
Then use an interactive node, as you detailed on slack:
salloc --nodes=1 --account=project_465000831 --partition=dev-g --time=00:30:00 --gpus-per-node=1
Then when I run, I get:
ulrichth@uan01:/scratch/project_465000831/2d> srun static circular_hole.toml --mesh_file circular_hole_1_005.msh --mg_strategy twolevel --mg_coarse_level 1 --petsc -device_view -ksp_view -ksp_monitor -options_file mg.opts -vec_type hip -mat_type aijhipsparse
PetscDevice Object: 1 MPI process
type: host
id: 0
PetscDevice Object: 1 MPI process
type: hip
id: 0
[0] name: AMD Instinct MI250X
Compute capability: 9.0
Multiprocessor Count: 110
Maximum Grid Dimensions: 2147483647 x 2147483647 x 2147483647
Maximum Block Dimensions: 1024 x 1024 x 1024
Maximum Threads Per Block: 1024
Warp Size: 64
Total Global Memory (bytes): 68702699520
Total Constant Memory (bytes): 2147483647
Shared Memory Per Block (bytes): 65536
Multiprocessor Clock Rate (KHz): 1700000
Memory Clock Rate (KHz): 1600000
Memory Bus Width (bits): 4096
Peak Memory Bandwidth (GB/s): 1638.400000
Can map host memory: PETSC_TRUE
Can execute multiple kernels concurrently: PETSC_TRUE
___ ___ _____ ___ ___
___ / /\ /__/\ / /::\ / /\ /__/\
/ /\ / /::\ \ \:\ / /:/\:\ / /:/_ | |::\
/ /:/ / /:/\:\ \ \:\ / /:/ \:\ / /:/ /\ | |:|:\
/ /:/ / /:/~/::\ _____\__\:\ /__/:/ \__\:| / /:/ /:/_ __|__|:|\:\
/ /::\ /__/:/ /:/\:\/__/::::::::\\ \:\ / /://__/:/ /:/ /\/__/::::| \:\
/__/:/\:\\ \:\/:/__\/\ \:\~~\~~\/ \ \:\ /:/ \ \:\/:/ /:/\ \:\~~\__\/
\__\/ \:\\ \::/ \ \:\ ~~~ \ \:\/:/ \ \::/ /:/ \ \:\
\ \:\\ \:\ \ \:\ \ \::/ \ \:\/:/ \ \:\
\__\/ \ \:\ \ \:\ \__\/ \ \::/ \ \:\
\__\/ \__\/ \__\/ \__\/
tandem version d959ff6
stack size limit = unlimited
Worker affinity
---------9|----------|----------|----------|----------|----------|
----------|----------|----------|----------|----------|----------|
--------
DOFs: 184320
Mesh size: 0.0348181
Multigrid P-levels: 1 2
Assembly: 0.643815 s
Residual norms for mg_levels_1_esteig_ solve.
0 KSP preconditioned resid norm 8.037704555893e+00 true resid norm 2.662831990382e+02 ||r(i)||/||b|| 1.000000000000e+00
1 KSP preconditioned resid norm 4.722340004803e+00 true resid norm 1.863568841026e+02 ||r(i)||/||b|| 6.998446945796e-01
2 KSP preconditioned resid norm 3.546908373955e+00 true resid norm 1.506940402628e+02 ||r(i)||/||b|| 5.659164408686e-01
3 KSP preconditioned resid norm 2.929713111560e+00 true resid norm 1.313232659117e+02 ||r(i)||/||b|| 4.931714294631e-01
4 KSP preconditioned resid norm 2.601033014142e+00 true resid norm 1.160343528098e+02 ||r(i)||/||b|| 4.357554409324e-01
5 KSP preconditioned resid norm 2.322237107768e+00 true resid norm 1.024850827973e+02 ||r(i)||/||b|| 3.848725085454e-01
6 KSP preconditioned resid norm 1.988103274760e+00 true resid norm 8.993133273981e+01 ||r(i)||/||b|| 3.377281520751e-01
7 KSP preconditioned resid norm 1.782459931748e+00 true resid norm 8.305911829516e+01 ||r(i)||/||b|| 3.119202360313e-01
8 KSP preconditioned resid norm 1.607976638237e+00 true resid norm 7.763890216529e+01 ||r(i)||/||b|| 2.915651548641e-01
9 KSP preconditioned resid norm 1.472235584816e+00 true resid norm 7.220406655476e+01 ||r(i)||/||b|| 2.711551716952e-01
10 KSP preconditioned resid norm 1.348929116073e+00 true resid norm 6.703277980381e+01 ||r(i)||/||b|| 2.517349199872e-01
Solver warmup: 0.34405 s
0 KSP Residual norm 7.740999135093e+02
0 KSP unpreconditioned resid norm 7.740999135093e+02 true resid norm 7.740999135093e+02 ||r(i)||/||b|| 1.000000000000e+00
Residual norms for mg_levels_1_ solve.
0 KSP Residual norm 1.710018445740e-02
1 KSP Residual norm 7.417800265937e-03
2 KSP Residual norm 1.189442691018e-02
3 KSP Residual norm 1.231458728026e-02
4 KSP Residual norm 7.676083691278e-03
5 KSP Residual norm 1.023793497515e-02
6 KSP Residual norm 6.084871435531e-03
7 KSP Residual norm 7.459714351571e-03
8 KSP Residual norm 4.584221164945e-03
9 KSP Residual norm 5.105136915222e-03
10 KSP Residual norm 3.987372338677e-03
Residual norms for mg_coarse_ solve.
0 KSP Residual norm 0.00232386
1 KSP Residual norm 0.00168699
2 KSP Residual norm 0.000824114
3 KSP Residual norm 0.000371013
4 KSP Residual norm 0.000222895
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: GPU error
[0]PETSC ERROR: hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE)
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR: Option left: name:-options_left (no value) source: file
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.20.1, Oct 31, 2023
[0]PETSC ERROR: --petsc on a named nid007976 by ulrichth Fri Sep 20 19:24:28 2024
[0]PETSC ERROR: Configure options --prefix=/project/project_465000831/spack_tandem/23.09/0.21.0/petsc-3.20.1-wecdeik --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 --with-make-exec=make --with-cc=/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/bin/mpicc --with-cxx=/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/bin/mpicxx --with-fc=/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-openmp=0 --with-64-bit-indices=1 --with-blaslapack-lib=/opt/cray/pe/libsci/23.09.1.1/gnu/10.3/x86_64/lib/libsci_gnu.so --with-memalign=32 --with-x=0 --with-sycl=0 --with-clanguage=C --with-cuda=0 --with-hip=1 --with-hip-include=/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/hip/include --with-metis=1 --with-metis-include=/project/project_465000831/spack_tandem/23.09/0.21.0/metis-5.1.0-wonbaai/include --with-metis-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/metis-5.1.0-wonbaai/lib/libmetis.so --with-hypre=1 --with-hypre-include=/project/project_465000831/spack_tandem/23.09/0.21.0/hypre-2.29.0-b6zxd2h/include --with-hypre-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/hypre-2.29.0-b6zxd2h/lib/libHYPRE.so --with-parmetis=1 --with-parmetis-include=/project/project_465000831/spack_tandem/23.09/0.21.0/parmetis-4.0.3-jl23cqy/include --with-parmetis-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/parmetis-4.0.3-jl23cqy/lib/libparmetis.so --with-kokkos=0 --with-kokkos-kernels=0 --with-superlu_dist=1 --with-superlu_dist-include=/project/project_465000831/spack_tandem/23.09/0.21.0/superlu-dist-8.1.2-ca6tkuz/include --with-superlu_dist-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/superlu-dist-8.1.2-ca6tkuz/lib/libsuperlu_dist.so --with-ptscotch=0 --with-suitesparse=0 --with-hdf5=1 --with-hdf5-include=/project/project_465000831/spack_tandem/23.09/0.21.0/hdf5-1.14.3-nugrcny/include --with-hdf5-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/hdf5-1.14.3-nugrcny/lib/libhdf5.so --with-zlib=0 --with-mumps=1 --with-mumps-include=/project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/include --with-mumps-lib="/project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libsmumps.so /project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libzmumps.so /project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libcmumps.so /project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libdmumps.so /project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libmumps_common.so /project/project_465000831/spack_tandem/23.09/0.21.0/mumps-5.5.1-5fli5tk/lib/libpord.so" --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-hwloc=0 --with-libjpeg=0 --with-scalapack=1 --with-scalapack-lib=/project/project_465000831/spack_tandem/23.09/0.21.0/netlib-scalapack-2.2.0-vaujoyo/lib/libscalapack.so --with-strumpack=0 --with-mmg=0 --with-parmmg=0 --with-tetgen=0 --with-hip-arch=gfx90a HIPPPFLAGS="-I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/project/project_465000831/spack_tandem/23.09/0.21.0/hipsolver-5.6.1-xes4kff/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include -I/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/include " --with-hip-lib="/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/lib/libhipsparse.so /appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/lib/libhipblas.so /project/project_465000831/spack_tandem/23.09/0.21.0/hipsolver-5.6.1-xes4kff/lib/libhipsolver.so /appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/lib/librocsparse.so /appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/lib/librocsolver.so /appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/lib/librocblas.so -L/appl/lumi/SW/CrayEnv/EB/rocm/5.6.1/hip/lib -lamdhip64"
[0]PETSC ERROR: #1 MatMultAddKernel_SeqAIJHIPSPARSE() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3131
[0]PETSC ERROR: #2 MatMultAdd_SeqAIJHIPSPARSE() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3004
[0]PETSC ERROR: #3 MatMultAdd() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/mat/interface/matrix.c:2780
[0]PETSC ERROR: #4 MatInterpolateAdd() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/mat/interface/matrix.c:8593
[0]PETSC ERROR: #5 PCMGMCycle_Private() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/pc/impls/mg/mg.c:87
[0]PETSC ERROR: #6 PCApply_MG_Internal() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/pc/impls/mg/mg.c:611
[0]PETSC ERROR: #7 PCApply_MG() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/pc/impls/mg/mg.c:633
[0]PETSC ERROR: #8 PCApply() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/pc/interface/precon.c:486
[0]PETSC ERROR: #9 KSP_PCApply() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #10 KSPFGMRESCycle() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:123
[0]PETSC ERROR: #11 KSPSolve_FGMRES() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:233
[0]PETSC ERROR: #12 KSPSolve_Private() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/ksp/interface/itfunc.c:910
[0]PETSC ERROR: #13 KSPSolve() at /tmp/ulrichth/spack-stage/spack-stage-petsc-3.20.1-wecdeikv5uvhsekejbvg7ojwun3x4mh3/spack-src/src/ksp/ksp/interface/itfunc.c:1082
[0]PETSC ERROR: #14 solve() at /tmp/ulrichth/spack-stage/spack-stage-tandem-hip-mewymp7bs55q5jgdybjyomw22alvng2p/spack-src/app/common/PetscLinearSolver.h:42
terminate called after throwing an instance of 'tndm::petsc_error'
what(): GPU error
srun: error: nid007976: task 0: Aborted
srun: Terminating StepId=8033350.5
I need to test with rocm 6 (instead of 5.6.1)
ulrichth@uan01:/project/project_465000831/spack_tandem> spack spec -I tandem@hip%gcc@13+rocm amdgpu_target=gfx90a domain_dimension=3 polynomial_degree=2
Input spec
--------------------------------
- tandem@hip%gcc@13+rocm amdgpu_target=gfx90a domain_dimension=3 polynomial_degree=2
Concretized
--------------------------------
[+] tandem@hip%gcc@13.2.1~cuda~ipo~libxsmm~python+rocm amdgpu_target=gfx90a build_system=cmake build_type=Release domain_dimension=3 generator=make min_quadrature_order=0 polynomial_degree=2 arch=linux-sles15-zen2
[e] ^cmake@3.20.4%gcc@13.2.1~doc+ncurses+ownlibs build_system=generic build_type=Release arch=linux-sles15-zen2
[e] ^cray-mpich@8.1.27%gcc@13.2.1+wrappers build_system=generic arch=linux-sles15-zen2
[+] ^eigen@3.4.0%gcc@13.2.1~ipo build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-sles15-zen2
[+] ^gmake@4.4.1%gcc@13.2.1~guile build_system=generic arch=linux-sles15-zen2
[e] ^hip@5.6.1%gcc@13.2.1~cuda+rocm build_system=cmake build_type=Release generator=make patches=aee7249,c2ee21c,e73e91b arch=linux-sles15-zen2
[e] ^hsa-rocr-dev@5.6.1%gcc@13.2.1+image+shared build_system=cmake build_type=Release generator=make patches=9267179 arch=linux-sles15-zen2
[e] ^llvm-amdgpu@5.6.1%gcc@13.2.1~link_llvm_dylib~llvm_dylib~openmp+rocm-device-libs build_system=cmake build_type=Release generator=ninja patches=a08bbe1,b66529f,d35aec9 arch=linux-sles15-zen2
[+] ^lua@5.4.4%gcc@13.2.1~pcfile+shared build_system=makefile fetcher=curl arch=linux-sles15-zen2
[+] ^curl@8.4.0%gcc@13.2.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=openssl arch=linux-sles15-zen2
[+] ^nghttp2@1.57.0%gcc@13.2.1 build_system=autotools arch=linux-sles15-zen2
[+] ^openssl@3.1.3%gcc@13.2.1~docs+shared build_system=generic certs=mozilla arch=linux-sles15-zen2
[+] ^ca-certificates-mozilla@2023-05-30%gcc@13.2.1 build_system=generic arch=linux-sles15-zen2
[+] ^perl@5.38.0%gcc@13.2.1+cpanm+opcode+open+shared+threads build_system=generic patches=714e4d1 arch=linux-sles15-zen2
[+] ^berkeley-db@18.1.40%gcc@13.2.1+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-sles15-zen2
[+] ^bzip2@1.0.8%gcc@13.2.1~debug~pic+shared build_system=generic arch=linux-sles15-zen2
[+] ^gdbm@1.23%gcc@13.2.1 build_system=autotools arch=linux-sles15-zen2
[+] ^pkgconf@1.9.5%gcc@13.2.1 build_system=autotools arch=linux-sles15-zen2
[+] ^ncurses@6.4%gcc@13.2.1~symlinks+termlib abi=none build_system=autotools arch=linux-sles15-zen2
[+] ^readline@8.2%gcc@13.2.1 build_system=autotools patches=bbf97f1 arch=linux-sles15-zen2
[+] ^unzip@6.0%gcc@13.2.1 build_system=makefile arch=linux-sles15-zen2
[+] ^metis@5.1.0%gcc@13.2.1~gdb+int64~ipo~real64+shared build_system=cmake build_type=Release generator=make patches=4991da9,93a7903,b1225da arch=linux-sles15-zen2
[+] ^parmetis@4.0.3%gcc@13.2.1~gdb+int64~ipo+shared build_system=cmake build_type=Release generator=make patches=4f89253,50ed208,704b84f arch=linux-sles15-zen2
[+] ^petsc@3.20.1%gcc@13.2.1~X~batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib+hdf5~hpddm~hwloc+hypre+int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr+mpi+mumps~openmp~p4est~parmmg~ptscotch~random123+rocm~saws+scalapack+shared~strumpack~suite-sparse+superlu-dist~sycl~tetgen~trilinos~valgrind amdgpu_target=gfx90a build_system=generic clanguage=C memalign=32 arch=linux-sles15-zen2
[e] ^cray-libsci@23.09.1.1%gcc@13.2.1~mpi~openmp+shared build_system=generic arch=linux-sles15-zen2
[+] ^diffutils@3.9%gcc@13.2.1 build_system=autotools arch=linux-sles15-zen2
[+] ^libiconv@1.17%gcc@13.2.1 build_system=autotools libs=shared,static arch=linux-sles15-zen2
[+] ^hdf5@1.14.3%gcc@13.2.1~cxx~fortran~hl~ipo~java~map+mpi+shared~szip~threadsafe+tools api=default build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^hipblas@5.6.1%gcc@13.2.1~cuda+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[+] ^hipsolver@5.6.1%gcc@13.2.1~cuda~ipo+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make patches=cfbe3d1 arch=linux-sles15-zen2
[e] ^rocm-cmake@5.6.1%gcc@13.2.1 build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^hipsparse@5.6.1%gcc@13.2.1~cuda+rocm amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[+] ^hypre@2.29.0%gcc@13.2.1~caliper~complex~cuda~debug+fortran~gptune+int64~internal-superlu~magma~mixedint+mpi~openmp~rocm+shared~superlu-dist~sycl~umpire~unified-memory build_system=autotools arch=linux-sles15-zen2
[+] ^mumps@5.5.1%gcc@13.2.1~blr_mt+complex+double+float~incfort~int64+metis+mpi~openmp+parmetis~ptscotch~scotch+shared build_system=generic patches=373d736 arch=linux-sles15-zen2
[+] ^netlib-scalapack@2.2.0%gcc@13.2.1~ipo~pic+shared build_system=cmake build_type=Release generator=make patches=072b006,1c9ce5f,244a9aa arch=linux-sles15-zen2
[e] ^python@3.11.7%gcc@13.2.1+bz2+crypt+ctypes+dbm~debug+libxml2+lzma+nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=13fa8bf,b0615b2,ebdca64,f2fd060 arch=linux-sles15-zen2
[e] ^rocblas@5.6.1%gcc@13.2.1+tensile amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^rocprim@5.6.1%gcc@13.2.1 amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^rocrand@5.6.1%gcc@13.2.1+hiprand amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^rocsolver@5.6.1%gcc@13.2.1+optimal amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^rocsparse@5.6.1%gcc@13.2.1~test amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[e] ^rocthrust@5.6.1%gcc@13.2.1 amdgpu_target=auto build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[+] ^superlu-dist@8.1.2%gcc@13.2.1~cuda+int64~ipo~openmp~rocm+shared build_system=cmake build_type=Release generator=make arch=linux-sles15-zen2
[+] ^zlib-ng@2.1.4%gcc@13.2.1+compat+opt build_system=autotools arch=linux-sles15-zen2
@Thomas-Ulrich You should try with the latest petsc version (3.21.5).
HIP Mat implementations are currently tagged as "under development". The team is constantly adding support for hip aij matrices (even as of v3.21.5). Hence I suspect what you see is due to a less complete implementation cf v3.21.5 (which I used).
Thx, tried with v3.21.5 and got the same problem. I guess the problem is rocm-6.0.3 vs rocm-5.6.1. But this one is not on the spck of lumi (just updated but outdated), only as a module.
I you could share how you build petsc and tandem, that would be useful.
ulrichth@uan01:/project/project_465000831/petsc> module list
Currently Loaded Modules:
1) craype-x86-rome 3) craype-network-ofi 5) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta 7) craype/2.7.31.11 9) cray-mpich/8.1.29 11) PrgEnv-cray/8.5.0 13) lumi-tools/24.05 (S) 15) spack/23.09
2) libfabric/1.15.2.0 4) perftools-base/24.03.0 6) cce/17.0.1 8) cray-dsmml/0.3.0 10) cray-libsci/24.03.0 12) ModuleLabel/label (S) 14) init-lumi/0.2 (S) 16) rocm/6.0.3
Where:
S: Module is Sticky, requires --force to unload or purge
Here is what I tried (based on your log):
export CPATH=/opt/rocm-6.0.3/include/rocm-core:$CPATH
git clone -b release https://gitlab.com/petsc/petsc.git petsc
./configure --with-mpi-dir=/opt/cray/pe/mpich/8.1.29/ofi/crayclang/17.0 --download-c2html=0 --download-fblaslapack=1 --download-hwloc=0 --download-sowing=0 --with-x=0 --with-hip-dir=/opt/rocm-6.0.3 --with-hipc=hipcc --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --download-metis --download-parmetis --with-memalign=32 --with-64-bit-indices --with-fortran-bindings=0
make PETSC_DIR=/pfs/lustrep4/projappl/project_465000831/petsc PETSC_ARCH=arch-linux-c-debug -j 32
and I get:
/pfs/lustrep4/projappl/project_465000831/petsc/src/sys/objects/device/util/memory.c:55:18: error: no member named 'memoryType' in 'struct hipPointerAttribute_t'
55 | mtype = attr.memoryType;
| ~~~~ ^
1 error generated.
gmake[3]: *** [gmakefile:197: arch-linux-c-debug/obj/src/sys/objects/device/util/memory.o] Error 1
gmake[3]: *** Waiting for unfinished jobs....
@hpc4geo: I tested the exact same setup you are running, but on heisenbug, and got the same results as you (CPU == GPU to some extent). So the problem is still there, and my spack installation is not guilty. Here is the exact setup I'm running: problematic_setup.tar.gz
You can (probably) check on LUMI that the convergence issue is not solve
mpiexec --bind-to core -n 1 static circular_hole.toml --resolution 0.8 --matrix_free yes --mg_strategy twolevel --mg_coarse_level 1 --mesh_file circular_hole_1_005.msh --petsc -options_file option4.cfg > cpu_new.txt
mpiexec --bind-to core -n 1 static circular_hole.toml --resolution 0.8 --matrix_free yes --mg_strategy twolevel --mg_coarse_level 1 --mesh_file circular_hole_1_005.msh --petsc -options_file option4cuda.cfg > gpu_new.txt
edit: added the logs compiled with:
spack install -j 32 tandem@main polynomial_degree=2 domain_dimension=2 +cuda cuda_arch=86 %gcc@12
@Thomas-Ulrich . The software stack is pretty complicated. It mixes many different packages. We are also mixing how we build / assemble the stack. We are also mix and matching devices (AMD vs NVIDIA).
To try and make sense of all this, I've put everything we have found into a table. Please review the entries with your name and let me know if I have misreported something related to those builds.
user | machine | petsc version | build process | device arch | device libs | outcome |
---|---|---|---|---|---|---|
dave | lumi-g | 3.21.5 release + EasyBuild patch | self installed | AMD MI250x | rocm6 + hip | success |
dave | lumi-g | petsc devel. repo, main branch | self installed | AMD MI250x | rocm6 + hip | success |
thomas | heisenbug | 3.21.5 release | spack | NVIDIA GeForce RTX 3090 | cuda-12.5.0 | success |
thomas | lumi-g | 3.20.1 | spack | AMD MI250x | rocm5 + hip | fail - hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE) |
thomas | lumi-g | 3.21.5 | module | AMD MI250x | rocm5 + hip | fail - hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE) |
Note that success
implies the convergence history of the solver using MG is nearly identical when you use the CPU, or the GPU.
Thx, tried with v3.21.5 and got the same problem. I guess the problem is rocm-6.0.3 vs rocm-5.6.1. But this one is not on the spck of lumi (just updated but outdated), only as a module.
I you could share how you build petsc and tandem, that would be useful.
ulrichth@uan01:/project/project_465000831/petsc> module list Currently Loaded Modules: 1) craype-x86-rome 3) craype-network-ofi 5) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta 7) craype/2.7.31.11 9) cray-mpich/8.1.29 11) PrgEnv-cray/8.5.0 13) lumi-tools/24.05 (S) 15) spack/23.09 2) libfabric/1.15.2.0 4) perftools-base/24.03.0 6) cce/17.0.1 8) cray-dsmml/0.3.0 10) cray-libsci/24.03.0 12) ModuleLabel/label (S) 14) init-lumi/0.2 (S) 16) rocm/6.0.3 Where: S: Module is Sticky, requires --force to unload or purge
I've added a description here https://github.com/TEAR-ERC/tandem/issues/76
Here is what I tried (based on your log):
export CPATH=/opt/rocm-6.0.3/include/rocm-core:$CPATH git clone -b release https://gitlab.com/petsc/petsc.git petsc ./configure --with-mpi-dir=/opt/cray/pe/mpich/8.1.29/ofi/crayclang/17.0 --download-c2html=0 --download-fblaslapack=1 --download-hwloc=0 --download-sowing=0 --with-x=0 --with-hip-dir=/opt/rocm-6.0.3 --with-hipc=hipcc --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --download-metis --download-parmetis --with-memalign=32 --with-64-bit-indices --with-fortran-bindings=0 make PETSC_DIR=/pfs/lustrep4/projappl/project_465000831/petsc PETSC_ARCH=arch-linux-c-debug -j 32
and I get:
/pfs/lustrep4/projappl/project_465000831/petsc/src/sys/objects/device/util/memory.c:55:18: error: no member named 'memoryType' in 'struct hipPointerAttribute_t' 55 | mtype = attr.memoryType; | ~~~~ ^ 1 error generated. gmake[3]: *** [gmakefile:197: arch-linux-c-debug/obj/src/sys/objects/device/util/memory.o] Error 1 gmake[3]: *** Waiting for unfinished jobs....
Right. The choices are
As for tandem I had to hack a few things. I didn't do the job properly hence I didn't commit these changes. The changes were as follows
diff --git a/app/CMakeLists.txt b/app/CMakeLists.txt
index bc48092..3a44816 100644
--- a/app/CMakeLists.txt
+++ b/app/CMakeLists.txt
@@ -190,12 +190,12 @@ set(APP_COMMON_SRCS
#pc/lspoly.c
pc/register.cpp
)
-if(${LAPACK_FOUND})
- list(APPEND APP_COMMON_SRCS
- pc/eigdeflate.c
- pc/reig_aux.c
- )
-endif()
+#if(${LAPACK_FOUND})
+# list(APPEND APP_COMMON_SRCS
+# pc/eigdeflate.c
+# pc/reig_aux.c
+# )
+#endif()
add_library(app-common ${APP_COMMON_SRCS})
target_compile_definitions(app-common PUBLIC "ALIGNMENT=${ALIGNMENT}")
target_link_libraries(app-common PUBLIC
diff --git a/app/pc/register.cpp b/app/pc/register.cpp
index e75b08b..60a9716 100644
--- a/app/pc/register.cpp
+++ b/app/pc/register.cpp
@@ -14,7 +14,7 @@ namespace tndm {
PetscErrorCode register_PCs() {
PetscFunctionBegin;
#ifdef HAVE_LAPACK
- CHKERRQ(PCRegister("eigdeflate", PCCreate_eigdeflate));
+// CHKERRQ(PCRegister("eigdeflate", PCCreate_eigdeflate));
#endif
PetscFunctionReturn(0);
}
Several tandem source files containing pure PETSc code need updating for PETSc 3.21.5. We rarely use this functionality, hence I just stop compiling them out of laziness.
diff --git a/src/mesh/GlobalSimplexMesh.cpp b/src/mesh/GlobalSimplexMesh.cpp
index 260d1a0..8856919 100644
--- a/src/mesh/GlobalSimplexMesh.cpp
+++ b/src/mesh/GlobalSimplexMesh.cpp
@@ -392,6 +392,7 @@ void GlobalSimplexMesh<D>::deleteDomainBoundaryFaces(facet_set_t& boundaryFaces)
}
}
+template class GlobalSimplexMesh<1u>;
template class GlobalSimplexMesh<2u>;
template class GlobalSimplexMesh<3u>;
I am unclear why the 1D instance was suddenly required.
@Thomas-Ulrich I re-ran your problematic setup on LUMI.
Please note that the arg --matrix_free yes
was omitted.
CPU
srun ./app/static circular_hole.toml --resolution 0.8 --mg_strategy twolevel --mg_coarse_level 1 --mesh_file circular_hole_1_005.msh --petsc -options_file mg.opts > lumi_circhole_cpu.txt
GPU
srun ./app/static circular_hole.toml --resolution 0.8 --mg_strategy twolevel --mg_coarse_level 1 --mesh_file circular_hole_1_005.msh --petsc -options_file mg.opts -vec_type hip -mat_type aijhipsparse > lumi_circhole_gpu.txt
The output files are attached - the results are nearly identical.
Now, if you use the option --matrix_free yes
with hip, I get the following error. I am trying to track this down and fix it.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Object (seq) is not seqhip or mpihip
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR: Option left: name:-options_left (no value) source: file
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.21.5-594-gd4cfec0c9b8 GIT Date: 2024-09-20 03:11:45 +0000
[0]PETSC ERROR: --petsc with 1 MPI process(es) and PETSC_ARCH arch-cray-c-debug-rocm-hip-tandem-vanil on nid005014 by maydave2 Thu Sep 26 21:31:51 2024
[0]PETSC ERROR: Configure options: --download-c2html=0 --download-fblaslapack=1 --download-hwloc=0 --download-cmake --download-metis --download-parmetis --download-sowing=0 --with-64-bit-indices --with-fortran-bindings=0 --with-hip --with-hip-arch=gfx90a --with-hipc=hipcc --with-memalign=32 --with-mpi-dir=/opt/cray/pe/mpich/8.1.29/ofi/crayclang/17.0 --with-x=0 PETSC_ARCH=arch-cray-c-debug-rocm-hip-tandem-vanil
[0]PETSC ERROR: #1 GetArray() at /projappl/project_465001082/dmay/software/petsc-dev-git/include/petsc/private/veccupmimpl.h:581
[0]PETSC ERROR: #2 VecCUPMGetArrayAsync_Private() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/vec/vec/impls/seq/cupm/hip/../vecseqcupm.hpp:206
[0]PETSC ERROR: #3 VecCUPMGetArrayReadAsync() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/vec/vec/impls/seq/cupm/hip/../vecseqcupm.hpp:240
[0]PETSC ERROR: #4 VecHIPGetArrayRead() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/vec/vec/impls/seq/cupm/hip/vecseqcupm.hip.cpp:251
[0]PETSC ERROR: #5 MatMultAddKernel_SeqAIJHIPSPARSE() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3069
[0]PETSC ERROR: #6 MatMultTranspose_SeqAIJHIPSPARSE() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3024
[0]PETSC ERROR: #7 MatMultTranspose() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/mat/interface/matrix.c:2724
[0]PETSC ERROR: #8 MatRestrict() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/mat/interface/matrix.c:8841
[0]PETSC ERROR: #9 PCMGMCycle_Private() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/pc/impls/mg/mg.c:68
[0]PETSC ERROR: #10 PCApply_MG_Internal() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/pc/impls/mg/mg.c:626
[0]PETSC ERROR: #11 PCApply_MG() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/pc/impls/mg/mg.c:648
[0]PETSC ERROR: #12 PCApply() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/pc/interface/precon.c:522
[0]PETSC ERROR: #13 KSP_PCApply() at /projappl/project_465001082/dmay/software/petsc-dev-git/include/petsc/private/kspimpl.h:411
[0]PETSC ERROR: #14 KSPFGMRESCycle() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:123
[0]PETSC ERROR: #15 KSPSolve_FGMRES() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:235
[0]PETSC ERROR: #16 KSPSolve_Private() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/ksp/interface/itfunc.c:900
[0]PETSC ERROR: #17 KSPSolve() at /pfs/lustrep3/projappl/project_465001082/dmay/software/petsc-dev-git/src/ksp/ksp/interface/itfunc.c:1075
[0]PETSC ERROR: #18 solve() at /users/maydave2/codes/tandem/app/common/PetscLinearSolver.h:42
terminate called after throwing an instance of 'tndm::petsc_error'
what(): Invalid argument
srun: error: nid005014: task 0: Aborted
ok, I see, on heisenbug, GPU==CPU for the problematic setup if I remove the matrix free option. But... if you look at the first message of this issue, you can see that the ridgecrest setup was run without matrix free option too, so maybe it is not the core problem
Looking backwards may not be super helpful as so many things have changed in both the software stack and the solver options being used. The test from 1 year ago used these smoother/coarse solver options -mg_coarse_pc_type gamg -mg_levels_pc_type bjacobi
. My suggestion a year ago was to investigate if the differences arose from these specific options - this wasn't pursued so we didn't learn anything new. We cannot go back in time and re-create this software stack on machines we care about today. Even if we could, it's not a productive use of anyones time.
Let's look forwards and work towards resolving any issues that we have today with: the latest petsc; current up-to-date software stacks; and importantly on machines LMU promised to make tandem run on in exchange for EU funding.
@Thomas-Ulrich I've put all the LUMI-G mods (including yours) into dmay/petsc_dev_hip. I've also resolved the error encountered using --matrix_free yes
here (https://github.com/TEAR-ERC/tandem/commit/b063726226bc34113f14847bc3c3eb71b8d478b3) which is also part of this branch.
Collectively these changes enable me to run your example with a bjacobi/ilu smoother and GAMG as the coarse grid PC and get identical convergence on the CPU and GPU.
Hi @hpc4geo, Using dmay/petsc_dev_hip, on heisenbug, GPU==CPU for the problematic setup also with the matrix-free option, and GPU==CPU for the Ridgecrest setup. So we can close the issue when the branch is merged. Thank you, and sorry for not following up on your initial suggestions right away when you created the issue.
Issue #50 identified some unexpected behavior when comparing CPU results with GPU results. The convergence history is different when the same PETSc option set is provided for an MG configuration.
Attached are the logs Thomas generated. tandem_GPU.log tandem_CPU.log
The things most likely causing differences in the residual history are probably associated with the ILU and LU solvers. Suggest confirming this by re-running CPU and GPU variants with the following additional options (placed at the end of any existing options).