Errors using OpenMP on Sedov test

joehellmers commented 2 years ago

Hello,

Hello I'm getting this error when trying to run Sedov 3D MHD with OpenMP

amrex::Error::0::Couldn't open file: sedov_3d_plt00000.temp/Level_0/Cell_D_00000 !!!
SIGABRT
amrex::Error::0::Couldn't open file: sedov_3d_plt00000.temp/Level_0/Cell_D_00000 !!!
SIGABRT
amrex::Error::0::Couldn't open file: sedov_3d_plt00000.temp/Level_0/Cell_D_00000 !!!
SIGABRT
See Backtrace.0.0 file for details
See Backtrace.0.0 file for details
See Backtrace.0.0 file for details
amrex::Error::0::Couldn't open file: sedov_3d_plt00050.temp/Level_0/Cell_D_00000 !!!
SIGABRT
amrex::Error::0::Couldn't open file: sedov_3d_plt00050.temp/Level_0/Cell_D_00000 !!!
SIGABRT
See Backtrace.0.0 file for details
See Backtrace.0.0 file for details
amrex::Error::0::Couldn't open file: sedov_3d_plt00062.temp/Level_0/Cell_D_00000 !!!
SIGABRT
amrex::Error::0::Couldn't open file: sedov_3d_plt00062.temp/Level_0/Cell_D_00000 !!!
SIGABRT
amrex::Error::0::Couldn't open file: sedov_3d_plt00062.temp/Level_0/Cell_D_00000 !!!
SIGABRT
See Backtrace.0.0 file for details
See Backtrace.0.0 file for details
See Backtrace.0.0 file for details

I'm getting an error in the backtrace when trying to use OpenMP.

In Backtrace.0.0 I see this

=== If no file names and line numbers are shown below, one can run
            addr2line -Cpfie my_exefile my_line_address
    to convert `my_line_address` (e.g., 0x4a6b) into file name and line number.
    Or one can use amrex/Tools/Backtrace/parse_bt.py.

=== Please note that the line number reported by addr2line may not be accurate.
    One can use
            readelf -wl my_exefile | grep my_line_address'
    to find out the offset for that line.

 0: ./Castro3d.gnu.OMP.ex(+0x1e2d65) [0x55e884552d65]
    amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX_BLBackTrace.cpp:179

 1: ./Castro3d.gnu.OMP.ex(+0x1e4b08) [0x55e884554b08]
    amrex::BLBackTrace::handler(int) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX_BLBackTrace.cpp:85

 2: ./Castro3d.gnu.OMP.ex(+0xe22a7) [0x55e8844522a7]
    amrex::Error_host(char const*) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX.cpp:221

 3: ./Castro3d.gnu.OMP.ex(+0x1119e3) [0x55e8844819e3]
    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:222
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:231
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:658
 (inlined by) amrex::FileOpenFailed(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX_Utility.cpp:167

 4: ./Castro3d.gnu.OMP.ex(+0x14e80d) [0x55e8844be80d]
    amrex::NFilesIter::ReadyToWrite(bool) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX_NFiles.cpp:321

 5: ./Castro3d.gnu.OMP.ex(+0x1418bf) [0x55e8844b18bf]
    amrex::VisMF::Write(amrex::FabArray<amrex::FArrayBox> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, amrex::VisMF::How, bool) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Base/AMReX_VisMF.cpp:1007

 6: ./Castro3d.gnu.OMP.ex(+0x5a8fe) [0x55e8843ca8fe]
    Castro::plotFileOutput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream&, amrex::VisMF::How, int) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../Source/driver/Castro_io.cpp:1156

 7: ./Castro3d.gnu.OMP.ex(+0x2523ac) [0x55e8845c23ac]
    amrex::Amr::writePlotFileDoit(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Amr/AMReX_Amr.cpp:995 (discriminator 2)

 8: ./Castro3d.gnu.OMP.ex(+0x252d82) [0x55e8845c2d82]
    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:222
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:231
 (inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:658
 (inlined by) amrex::Amr::writePlotFile() at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../external/amrex/Src/Amr/AMReX_Amr.cpp:880

 9: ./Castro3d.gnu.OMP.ex(+0x23256) [0x55e884393256]
    main at /shared/castro-21.11/Castro/Exec/hydro_tests/Sedov/../../../Source/driver/main.cpp:160

10: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f1bd3bca0b3]

11: ./Castro3d.gnu.OMP.ex(+0x2ae6e) [0x55e88439ae6e]
    ?? ??:0

The only thing I added to inputs.3d.mhd is castro.max_subcycles = 16

My GNUmakefile is

PRECISION  = DOUBLE
PROFILE    = FALSE

DEBUG      = FALSE

DIM        = 3

COMP       = gnu

USE_MPI    = FALSE
USE_OMP    = TRUE

USE_MHD    = TRUE

USE_FORT_MICROPHYSICS := FALSE
BL_NO_FORT := TRUE

# define the location of the CASTRO top directory
CASTRO_HOME  := ../../..

# This sets the EOS directory in $(MICROPHYSICS_HOME)/EOS
EOS_DIR     := gamma_law

# This sets the network directory in $(MICROPHYSICS_HOME)/Networks
NETWORK_DIR := general_null
NETWORK_INPUTS = gammalaw.net

Bpack   := ./Make.package
Blocs   := .

include $(CASTRO_HOME)/Exec/Make.Castro

zingale commented 2 years ago

I can't seem to reproduce this. There might be a race condition somewhere or a filesystem issue, but when I run with 16 OpenMP threads, I can output the plotfile without issue.

Can you tell me what machine you ran on, how many MPI tasks and how many OpenMP threads?

zingale commented 2 years ago

oh, I just noticed you are running without MPI, just OpenMP. I tried that and I also have no problem outputting.

joehellmers commented 2 years ago

Ah, the problem was using mpirun instead of just running the executable. After fixing that the simulation works, but I'm not seeing any improvement in Wall-time for 1, 2, 4, 8 and 12 OpenMP threads. It's not a big deal for me right now, I'm just try to assess how Castro scales for an XSEDE allocation right now. MPI scales well.

zingale commented 2 years ago

it looks like we never tiled the MHD algorithm, since we were mostly interested running it on GPUs. I'll look at adding the tiling tonight.

joehellmers commented 2 years ago

Is that easy to do? Perhaps I could do it.

zingale commented 2 years ago

it may be as simple as adding

TilingIfNotGPU() to the single MFIter loop in Castro_mhd.cpp. But I am not 100% certain.

joehellmers commented 2 years ago

OK. I'll setup a fork and give it a shot.

joehellmers commented 2 years ago

That seems to have helped quite a bit, although on my system scalability does seem to stall out with over 8 cores.

castro-sedovdev-1x1.omp.out:3727:Run time without initialization = 199.2570635
castro-sedovdev-1x2.omp.out:3727:Run time without initialization = 110.7894975
castro-sedovdev-1x4.omp.out:3727:Run time without initialization = 61.3167444
castro-sedovdev-1x8.omp.out:3727:Run time without initialization = 35.07138379
castro-sedovdev-1x12.omp.out:3727:Run time without initialization = 33.61117978

Should I create a pull request related to this issue, or should I make a new issue?

zingale commented 2 years ago

The way tiling works is that it divides a box up into logical tiles and then distributes those tiles across the OMP threads. If you are running the default inputs.3d.mhd, then you have a single 32^3 box. The default tile size is 1024x8x8, so that would give you 16 tiles for OMP, which don't spread nicely over 12 cores. Not sure if your chip has 16 cores, but OMP might work better there.

you could try setting

fabarray.mfiter_tile_size = 1024 4 4

which would give you 64 tiles, but at some point it might simply be too small to scale effectively.

joehellmers commented 2 years ago

I tried the setting but it actually made it worse (43 seconds instead of 33). The node I'm running on only has 12 non-hyperthreaded cores, so maybe there is some sort of contention with other processes running on that nodes (i.e. Slurm, NFS client, etc.). I think the pull request for the fix is #2039.

zingale commented 1 month ago

closing this since I think the initial issue was resolved.

AMReX-Astro / Castro

Errors using OpenMP on Sedov test #2036