CUDA version report error on certain inputs

lingda-li commented 2 years ago

Hi,

I compiled the CUDA version of IAMR examples, but found they will report errors on certain inputs. For example, inIAMR/Exec/eb_run2d: "./amr2d.gnu.MPI.CUDA.ex inputs.2d.double_shear_layer-rotate" runs correctly. "./amr2d.gnu.MPI.CUDA.ex inputs.2d.flow_past_cylinder-x" reports the following errors: No protocol specified Initializing CUDA... CUDA initialized with 1 GPU per MPI rank; 1 GPU(s) used in total MPI initialized with 1 MPI processes MPI initialized with thread support level 0 AMReX (21.12-dirty) initialized xlo set to mass inflow. xhi set to pressure outflow. Warning: both amr.plot_int and amr.plot_per are > 0.! NavierStokesBase::init_additional_state_types()::have_divu = 0 NavierStokesBase::init_additional_state_types()::have_dsdt = 0 NavierStokesBase::init_additional_state_types: num_state_type = 3 Initializing EB2 structs Creating projector Installing projector level 0 amrex::Abort::0::GPU last error detected in file ../../../amrex/Src/Base/AMReX_GpuLaunchFunctsG.H line 834: invalid device function !!! SIGABRT ^[[ASee Backtrace.0 file for details

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 6.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

Backtrace.0 is as follows: === If no file names and line numbers are shown below, one can run addr2line -Cpfie my_exefile my_line_address to convert my_line_address (e.g., 0x4a6b) into file name and line number. Or one can use amrex/Tools/Backtrace/parse_bt.py.

=== Please note that the line number reported by addr2line may not be accurate. One can use readelf -wl my_exefile | grep my_line_address' to find out the offset for that line.

0: ./amr2d.gnu.MPI.CUDA.ex(+0x2f20b5) [0x561c797640b5] amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BLBackTrace.cpp:179

1: ./amr2d.gnu.MPI.CUDA.ex(+0x2f3e35) [0x561c79765e35] amrex::BLBackTrace::handler(int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BLBackTrace.cpp:85

2: ./amr2d.gnu.MPI.CUDA.ex(+0x62265) [0x561c794d4265] std::cxx11::basic_string<char, std::char_traits, std::allocator >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:222 (inlined by) std::cxx11::basic_string<char, std::char_traits, std::allocator >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:231 (inlined by) std::__cxx11::basic_string<char, std::char_traits, std::allocator >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:658 (inlined by) amrex::Gpu::ErrorCheck(char const*, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_GpuError.H:54

3: ./amr2d.gnu.MPI.CUDA.ex(+0x7e41c) [0x561c794f041c] amrex::Gpu::AsyncArray<amrex::Box, 0>::~AsyncArray() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_GpuAsyncArray.H:64 (inlined by) void amrex::GpuBndryFuncFab::ccfcdoit(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocator > const&, int, int, amrex::FilccCell&&) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_PhysBCFunct.H:393

4: ./amr2d.gnu.MPI.CUDA.ex(+0x71dc5) [0x561c794e3dc5] amrex::GpuBndryFuncFab::operator()(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocator > const&, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_PhysBCFunct.H:204 (inlined by) dummy_fill(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocator > const&, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NS_bcfill.H:272

5: ./amr2d.gnu.MPI.CUDA.ex(+0x3b94fa) [0x561c7982b4fa] amrex::StateData::FillBoundary(amrex::Box const&, amrex::FArrayBox&, double, amrex::Geometry const&, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_StateData.cpp:556

6: ./amr2d.gnu.MPI.CUDA.ex(+0x3bb61d) [0x561c7982d61d] amrex::StateDataPhysBCFunct::operator()(amrex::MultiFab&, int, int, amrex::IntVect const&, double, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_StateData.cpp:909

7: ./amr2d.gnu.MPI.CUDA.ex(+0x3b2795) [0x561c79824795] std::enable_if<amrex::IsFabArray<amrex::MultiFab, void>::value, void>::type amrex::FillPatchSingleLevel<amrex::MultiFab, amrex::StateDataPhysBCFunct>(amrex::MultiFab&, amrex::IntVect const&, double, amrex::Vector<amrex::MultiFab, std::allocator<amrex::MultiFab> > const&, amrex::Vector<double, std::allocator > const&, int, int, int, amrex::Geometry const&, amrex::StateDataPhysBCFunct&, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/AmrCore/AMReX_FillPatchUtil_I.H:159

8: ./amr2d.gnu.MPI.CUDA.ex(+0x3a9c82) [0x561c7981bc82] std::vector<double, std::allocator >::~vector() at /usr/include/c++/9/bits/stl_vector.h:677 (inlined by) amrex::Vector<double, std::allocator >::~Vector() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_Vector.H:25 (inlined by) amrex::FillPatchIterator::FillFromLevel0(double, int, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:1102

9: ./amr2d.gnu.MPI.CUDA.ex(+0x3aa29d) [0x561c7981c29d] amrex::FillPatchIterator::Initialize(int, double, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:1016

10: ./amr2d.gnu.MPI.CUDA.ex(+0x3ab441) [0x561c7981d441] amrex::AmrLevel::FillPatch(amrex::AmrLevel&, amrex::MultiFab&, int, double, int, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:2113

11: ./amr2d.gnu.MPI.CUDA.ex(+0xc69af) [0x561c795389af] NavierStokesBase::computeGradP(double) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NavierStokesBase.cpp:4291

12: ./amr2d.gnu.MPI.CUDA.ex(+0x840dd) [0x561c794f60dd] NavierStokes::initData() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NavierStokes.cpp:371

13: ./amr2d.gnu.MPI.CUDA.ex(+0x390a41) [0x561c79802a41] std::shared_count<(__gnu_cxx::_Lock_policy)2>::~shared_count() at /usr/include/c++/9/bits/shared_ptr_base.h:729 (inlined by) std::shared_ptr<amrex::BoxList, (__gnu_cxx::_Lock_policy)2>::~shared_ptr() at /usr/include/c++/9/bits/shared_ptr_base.h:1169 (inlined by) std::shared_ptr::~shared_ptr() at /usr/include/c++/9/bits/shared_ptr.h:103 (inlined by) amrex::BoxArray::~BoxArray() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BoxArray.H:556 (inlined by) amrex::Amr::defBaseLevel(double, amrex::BoxArray const, amrex::Vector<int, std::allocator > const) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:2504

14: ./amr2d.gnu.MPI.CUDA.ex(+0x39bc32) [0x561c7980dc32] amrex::Amr::initialInit(double, double, amrex::BoxArray const, amrex::Vector<int, std::allocator > const) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:1274 (inlined by) amrex::Amr::init(double, double) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:1142

15: ./amr2d.gnu.MPI.CUDA.ex(+0x437bb) [0x561c794b57bb] main at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/main.cpp:96

16: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa97ec3a0b3]

17: ./amr2d.gnu.MPI.CUDA.ex(+0x4d5de) [0x561c794bf5de] ?? ??:0

Could you please help with this?

WeiqunZhang commented 2 years ago

Are you using CUDA 11.6? If so, the issue is probably similar to this https://github.com/AMReX-Codes/amrex/issues/2607. We are still investigating this. For now, maybe you can use a different version of CUDA.

lingda-li commented 2 years ago

Thanks for the quick response! I'm using CUDA 11.4, so maybe not the same issue

WeiqunZhang commented 2 years ago

Could you try amrex/Tests/Amr/Advection_AmrCore/Exec to see if it works? What GPU are you using? Could you provide the stdout of make so that we can see if the CUDA_ARCH provided to the compiler is consistent with your GPU?

lingda-li commented 2 years ago

I compiled amrex/Tests/Amr/Advection_AmrCore/Exec using CUDA, and it runs fine with inputs. I am using RTX 3090 and sm_86, which should be correct based on NVidia's docs. Since CUDA version works with some inputs, I guess the problem is not CUDA_ARCH. The make command output for eb_run2d is attached tmp.log .

cgilet commented 2 years ago

I noticed that you must have local modifications in AMReX (as indicated by "AMReX (21.12-dirty) initialized" in the run output). Could you please try with clean versions of AMReX, AMReX-Hydro, and IAMR. Also note that a newer version of IAMR is not guaranteed to work with an older version of AMReX. I recommend checking out the most recent releases of them both (22.02)

lingda-li commented 2 years ago

I noticed that you must have local modifications in AMReX (as indicated by "AMReX (21.12-dirty) initialized" in the run output). Could you please try with clean versions of AMReX, AMReX-Hydro, and IAMR. Also note that a newer version of IAMR is not guaranteed to work with an older version of AMReX. I recommend checking out the most recent releases of them both (22.02)

Thanks for the advice. I updated AMReX, AMReX-Hydro, and IAMR to the latest upstream clean version. However the same problem still persists. Is there any particular setting in these inputs which will cause this error?

cgilet commented 2 years ago

I am unable to reproduce this error with CUDA 11.5. Would it be possible for you to switch to this version?

cgilet commented 2 years ago

I'm also unable to reproduce the error with CUDA 11.4. Could you try with completely new clones of the repos? git can do unexpected things sometimes.

cgilet commented 2 years ago

Closing this since it's been over 3 months since the last comment. @lingda-li please open a new issue if you're still having problems.

AMReX-Fluids / IAMR

CUDA version report error on certain inputs #121

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.