AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
519 stars 339 forks source link

Errors about compiling and running hypre gpu example of amrex tutorials #3854

Closed ruohai0925 closed 1 month ago

ruohai0925 commented 4 months ago

Hi, I plan to run the hypre gpu example of amrex-tutorials/ExampleCodes/LinearSolvers/ABecLaplacian_C. I use the latest AMReX, amrex-tutorials, and hypre. Here are my detailed steps, followed by some questions.

  1. I build the hypre library without using the unified memory, MPI, OpenMP. I use CUDA and SINGLE precision.
cmake -DHYPRE_WITH_MPI=OFF -DHYPRE_WITH_OPENMP=OFF -DHYPRE_BUILD_EXAMPLES=OFF -DHYPRE_ENABLE_UNIFIED_MEMORY=OFF -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CXX_STANDARD=17 -DHYPRE_ENABLE_SINGLE=ON -DHYPRE_WITH_CUDA=ON -DCUDA_HOME=”/usr/local/cuda” -DHYPRE_CUDA_SM=86 -DHYPRE_BUILD_TYPE=Release -DCMAKE_BUILD_TYPE=Release -DHYPRE_ENABLE_DEVICE_POOL=ON .. 

cmake --build . --config Release 

cmake --install . --prefix /home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory --config Release
  1. Then I do

export HYPRE_DIR=/home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory

  1. Next I directly go to the third step of this pape https://amrex-codes.github.io/amrex/tutorials_html/Hypre_Install.html#building-with-hypre-via-gnumake to compile this example using GNUMake by using the following script.
DEBUG = FALSE

USE_MPI  = FALSE
USE_OMP  = FALSE
USE_CUDA = TRUE
USE_PRECISION = FLOAT
USE_HYPRE = TRUE

USE_PETSC = FALSE

COMP = gnu

DIM = 3

AMREX_HOME ?= ../../../../amrex

include $(AMREX_HOME)/Tools/GNUMake/Make.defs

include ./Make.package

Pdirs   := Base Boundary LinearSolvers/MLMG

Ppack   += $(foreach dir, $(Pdirs), $(AMREX_HOME)/Src/$(dir)/Make.package)

include $(Ppack)

include $(AMREX_HOME)/Tools/GNUMake/Make.rules

The compiling error is:

nvcc -MMD -MP -ccbin=g++ -Xcompiler=' -Werror=return-type -g -O3 -std=c++17  -pthread  -std=c++17' --std=c++17 -Wno-deprecated-gpu-targets -m64 --generate-code arch=compute_80,code=sm_80 -maxrregcount=255 --expt-relaxed-constexpr --expt-extended-lambda --forward-unknown-to-host-compiler -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=implicit_return_from_non_void_function --Werror cross-execution-space-call -lineinfo --ptxas-options=-O3 --ptxas-options=-v --use_fast_math  --Werror ext-lambda-captures-this --display-error-number --diag-error 20092 -x cu -dc   -DAMREX_USE_CUDA -DAMREX_USE_GPU -DBL_COALESCE_FABS -DAMREX_GPU_MAX_THREADS=256 -DAMREX_USE_GPU_RDC -DBL_SPACEDIM=3 -DAMREX_SPACEDIM=3 -DBL_FORT_USE_UNDERSCORE -DAMREX_FORT_USE_UNDERSCORE -DBL_Linux -DAMREX_Linux -DNDEBUG -DAMREX_USE_HYPRE -Itmp_build_dir/s/3d.gnu.CUDA.EXE -I. -I../../../../amrex/Src/Extern/HYPRE -I../../../../amrex/Src/Base -I../../../../amrex/Src/Base/Parser -I../../../../amrex/Src/Boundary -I../../../../amrex/Src/LinearSolvers/MLMG -isystem /home/zengx372/Codes/hypre/src/hypre_gpu_wounifiedmemory/include -isystem /usr/local/cuda/include  -c MyTest.cpp -o tmp_build_dir/o/3d.gnu.CUDA.EXE/MyTest.o
../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(153): error: argument of type "amrex::HypreIJIface::HypreRealType *" is incompatible with parameter of type "amrex::Real *"
      m_solverFinalResidualNormPtr(m_solver, &m_final_res_norm);
                                             ^

../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(392): error: a value of type "HYPRE_Int (*)(HYPRE_Solver, HYPRE_Real)" cannot be assigned to an entity of type "amrex::HypreIJIface::HypreIntType (*)(HYPRE_Solver, amrex::Real)"
      m_solverSetTolPtr = &HYPRE_BoomerAMGSetTol;
                        ^

../../../../amrex/Src/Extern/HYPRE/AMReX_HypreIJIface.cpp(396): error: a value of type "HYPRE_Int (*)(HYPRE_Solver, HYPRE_Real *)" cannot be assigned to an entity of type "amrex::HypreIJIface::HypreIntType (*)(HYPRE_Solver, amrex::Real *)"
      m_solverFinalResidualNormPtr = &HYPRE_BoomerAMGGetFinalRelativeResidualNorm;

It seems that there are some mismatches of the datatype in the above output.

  1. Then I also try to use CMake to compile this example by following https://amrex-codes.github.io/amrex/tutorials_html/Hypre_Install.html#building-with-hypre-via-cmake. The commands are:
cmake .. -DAMReX_HYPRE=ON -DHYPRE_LIBRARIES=${HYPRE_DIR}/lib64/libHYPRE.a -DHYPRE_INCLUDE_DIRS=${HYPRE_DIR}/include -DAMReX_LINEAR_SOLVERS=ON -DAMReX_FORTRAN=ON -DAMReX_PRECISION=SINGLE -DAMReX_MPI=NO -DAMReX_OMP=OFF -DAMReX_CUDA_DEBUG=NO -DAMReX_GPU_BACKEND=CUDA -DAMReX_CUDA_ARCH=86 -DAMReX_AMRLEVEL=NO

cmake --build . -j8

The build works as image

But when I run the example with:

./ABecLaplacian_C inputs.hypre

It shows the error:

Initializing AMReX (24.03-37-g2a9e992fce9b)...
Initializing CUDA...
CUDA initialized with 1 device.
AMReX (24.03-37-g2a9e992fce9b) initialized
MLMG: # of AMR levels: 1
      # of MG levels on the coarsest AMR level: 1
MLMG: Initial rhs               = 2347.400635
MLMG: Initial residual (resid0) = 2347.400635
Erroneous arithmetic operation
See Backtrace.0 file for details

And here is the Backtrace file:

0: ./ABecLaplacian_C() [0x4e50e6]
    amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at ??:?

 1: ./ABecLaplacian_C() [0x4e77c2]
    amrex::BLBackTrace::handler(int) at ??:?

 2: /lib64/libc.so.6(+0x4ad50) [0x7fe7a0419d50]

 3: ./ABecLaplacian_C() [0x94a3bd]
    hypre_BoomerAMGSolve at ??:?

 4: ./ABecLaplacian_C() [0x8de441]
    amrex::HypreIJIface::solve(float, float, int) at ??:?

 5: ./ABecLaplacian_C() [0x8bfd1e]
    amrex::HypreABecLap3::solve(amrex::MultiFab&, amrex::MultiFab const&, float, float, int, amrex::BndryDataT<amrex::MultiFab> const&, int) at ??:?

 6: ./ABecLaplacian_C() [0x753df6]
    void amrex::MLMGT<amrex::MultiFab>::bottomSolveWithHypre<amrex::MultiFab, 0>(amrex::MultiFab&, amrex::MultiFab const&) at ??:?

 7: ./ABecLaplacian_C() [0x75467e]
    amrex::MLMGT<amrex::MultiFab>::actualBottomSolve() at ??:?

 8: ./ABecLaplacian_C() [0x754c50]
    amrex::MLMGT<amrex::MultiFab>::mgVcycle(int, int) at ??:?

 9: ./ABecLaplacian_C() [0x75660c]
    amrex::MLMGT<amrex::MultiFab>::oneIter(int) at ??:?

10: ./ABecLaplacian_C() [0x45a636]
    float amrex::MLMGT<amrex::MultiFab>::solve<amrex::MultiFab>(amrex::Vector<amrex::MultiFab*, std::allocator<amrex::MultiFab*> > const&, amrex::Vector<amrex::MultiFab const*, std::allocator<amrex::MultiFab const*> > const&, float, float, char const*) at ??:?

11: ./ABecLaplacian_C() [0x45b724]
    float amrex::MLMGT<amrex::MultiFab>::solve<amrex::MultiFab>(std::initializer_list<amrex::MultiFab*>, std::initializer_list<amrex::MultiFab const*>, float, float, char const*) at ??:?

12: ./ABecLaplacian_C() [0x44ce3e]
    MyTest::solveABecLaplacian() at ??:?

13: ./ABecLaplacian_C() [0x426f59]
    main at ??:?

14: /lib64/libc.so.6(__libc_start_main+0xef) [0x7fe7a040429d]

15: ./ABecLaplacian_C() [0x44868a]
    _start
../sysdeps/x86_64/start.S:122

Can someone help to give me some suggestions? I will be happy to provide info if needed.

ruohai0925 commented 4 months ago

The above errors still appear even if I compile and build the hypre with -DHYPRE_ENABLE_UNIFIED_MEMORY=ON and add the amrex.the_arena_is_managed = 1 in the inputs file.

WeiqunZhang commented 4 months ago

Does it work if you use double precision?

WeiqunZhang commented 4 months ago

I can reproduce the failure. It might be that the particular problem is simply too hard in single precision. The relative tolerance in the setup is hard wired to 1.e-10. I changed them all to 1.e-4. Then if I run with prob_type=1 (i.e., Poisson), it seems to work. If I run with prob_type=2 n_cell=32, it runs for a number of iterations and the residual blows up.

WeiqunZhang commented 4 months ago

If you make the problem type 2 easier by make it more diagonal dominant,

index 97c928a88d..f3d7bb697d 100644
--- a/Tests/LinearSolvers/ABecLaplacian_C/MyTest.H
+++ b/Tests/LinearSolvers/ABecLaplacian_C/MyTest.H
@@ -75,7 +75,7 @@ private:
     amrex::Vector<amrex::MultiFab> acoef;
     amrex::Vector<amrex::MultiFab> bcoef;

-    amrex::Real ascalar = 1.e-3;
+    amrex::Real ascalar = 1.e-1;
     amrex::Real bscalar = 1.0;
 };

it also works. So I don't think there is a real issue here.

asalmgren commented 1 month ago

@ruohai0925 - Closing this issue due to lack of activity. Please re-open it if there is still a problem.