Closed ax3l closed 2 years ago
I have tried in debug mode locally but couldn't see a crash. :'( I have also tried with valgrind but nothing popped up.
Backtrace printing added in https://github.com/ECP-WarpX/regression_testing/pull/16, let's see if we catch it next time
Here is another backtrace for this from a CI raw log in #2429: BackTrace.txt.
Saw the same backtrace again in #2479: Backtrace.txt
Saw the same backtrace again in #2530 CI_Backtrace.txt
It does not crash locally for me
a potential fix #2543
Closed PR #2543 since BTD was not called at initialization anyway. Looking into Source/Diagnostics/BTD_Plotfile_Header_Impl.cpp:40
Seen again in https://github.com/ECP-WarpX/WarpX/pull/2574#issuecomment-974506256 with the following backtrace:
5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7fd579967911]
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data(char*) at /usr/include/c++/9/bits/basic_string.h:179
(inlined by) void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) at /usr/include/c++/9/bits/basic_string.tcc:219
(inlined by) void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char const*>(char const*, char const*, std::__false_type) at /usr/include/c++/9/bits/basic_string.h:247
6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7fd57997338c]
amrex::Box::coarsen(amrex::IntVect const&) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_Box.H:801
(inlined by) PEC::ApplyPECtoBfield(std::array<amrex::MultiFab*, 3ul>, int, PatchType) at /tmp/ci-oFCLwlOx0A/warpx/./Source/BoundaryConditions/WarpX_PEC.cpp:131
7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7) [0x7fd5799733f7]
amrex::coarsen(int, int) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_IntVect.H:28
(inlined by) amrex::IntVect::coarsen(amrex::IntVect const&) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_IntVect.H:591
(inlined by) amrex::Box::coarsen(amrex::IntVect const&) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_Box.H:804
(inlined by) PEC::ApplyPECtoBfield(std::array<amrex::MultiFab*, 3ul>, int, PatchType) at /tmp/ci-oFCLwlOx0A/warpx/./Source/BoundaryConditions/WarpX_PEC.cpp:131
8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9) [0x7fd5799736a9]
amrex::coarsen(int, int) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_IntVect.H:32
(inlined by) amrex::IntVect::coarsen(amrex::IntVect const&) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_IntVect.H:591
(inlined by) amrex::Box::coarsen(amrex::IntVect const&) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_Box.H:804
(inlined by) PEC::ApplyPECtoBfield(std::array<amrex::MultiFab*, 3ul>, int, PatchType) at /tmp/ci-oFCLwlOx0A/warpx/./Source/BoundaryConditions/WarpX_PEC.cpp:131
9: /lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt19__throw_ios_failurePKc+0x91) [0x7fd57996ac23]
?? ??:0
10: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x114b72) [0x7fd5799ddb72]
amrex::PODVector<int, std::allocator<int> >::GetNewCapacity(unsigned long) const at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_PODVector.H:504
(inlined by) amrex::PODVector<int, std::allocator<int> >::resize(unsigned long) at /tmp/ci-oFCLwlOx0A/amrex//Src/Base/AMReX_PODVector.H:445
(inlined by) amrex::ParticleContainer<0, 0, 4, 0, amrex::PinnedArenaAllocator>::SetParticleSize() at /tmp/ci-oFCLwlOx0A/amrex//Src/Particle/AMReX_ParticleContainerI.H:9
11: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xfb4a8) [0x55f88d9534a8]
std::basic_ios<char, std::char_traits<char> >::setstate(std::_Ios_Iostate) at /usr/include/c++/9/bits/basic_ios.h:158
(inlined by) std::basic_ifstream<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) at /usr/include/c++/9/fstream:661
(inlined by) std::basic_ifstream<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) at /usr/include/c++/9/fstream:658
(inlined by) BTDPlotfileHeaderImpl::ReadHeaderData() at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/BTD_Plotfile_Header_Impl.cpp:32
12: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xef930) [0x55f88d947930]
BTDPlotfileHeaderImpl::set_timestep(int) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/BTD_Plotfile_Header_Impl.H:83
(inlined by) BTDiagnostics::InterleaveBufferAndSnapshotHeader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/BTDiagnostics.cpp:739
13: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xf21b2) [0x55f88d94a1b2]
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:222
(inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:231
(inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:658
(inlined by) BTDiagnostics::MergeBuffersForPlotfile(int) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/BTDiagnostics.cpp:712
14: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xf2c26) [0x55f88d94ac26]
BTDiagnostics::Flush(int) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/BTDiagnostics.cpp:649
15: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xae888) [0x55f88d906888]
Diagnostics::FilterComputePackFlush(int, bool) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/Diagnostics.cpp:341
16: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0xad201) [0x55f88d905201]
MultiDiagnostics::FilterComputePackFlush(int, bool) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Diagnostics/MultiDiagnostics.cpp:74 (discriminator 2)
17: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0x437584) [0x55f88dc8f584]
WarpX::Evolve(int) at /tmp/ci-oFCLwlOx0A/warpx/./Source/Evolve/WarpXEvolve.cpp:341
18: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0x5e854) [0x55f88d8b6854]
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider::_Alloc_hider(char*, std::allocator<char> const&) at /usr/include/c++/9/bits/basic_string.h:157
(inlined by) std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) at /usr/include/c++/9/bits/basic_string.h:526
(inlined by) main at /tmp/ci-oFCLwlOx0A/warpx/./Source/main.cpp:69
19: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fd57952d0b3]
20: ./main3d.gnu.TEST.TPROF.MTMPI.OMP.QED.OPMD.PSATD.GPUCLOCK.ex(+0x6eb3e) [0x55f88d8c6b3e]
?? ??:0
===== TinyProfilers ======
main()
WarpX::Evolve()
WarpX::Evolve::step
Diagnostics::FilterComputePackFlush()
WARNING: +++ End of backtrace: BTD_ReducedSliceDiag.Backtrace.0.0 +++
BTD_ReducedSliceDiag CRASHED (backtraces produced)
I think this is a race condition for BTDiagnostics::MergeBuffersForPlotfile
@RevathiJambunathan @atmyers.
Last seen in: https://github.com/ECP-WarpX/WarpX/pull/2300
Stack:
Is it possible that the file does not exist yet? I think before we start merging via BTDiagnostics::MergeBuffersForPlotfile
, we must make sure that:
Only the first two steps would be ok for now but are not 100% ideal, because FS-sync != MPI context sync.
Fix for CI flakyness (race condition between writing MPI ranks and readers) in #2608.
In CI, we see that the
cartesian3d
testBTD_ReducedSliceDiag
sporatically crashes. This needs maybe a build in debug mode locally to find out where it crashes.CC @RemiLehe @RevathiJambunathan