ECP-WarpX / WarpX

WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code.
https://ecp-warpx.github.io
Other
307 stars 195 forks source link

error when using openPMD + BTD? #1915

Open MaxThevenet opened 3 years ago

MaxThevenet commented 3 years ago

I encountered an error with openPMD BTD with this 2D input file when executing e.g.

# compilation
cmake .. -DWarpX_DIMS=2 -DWarpX_OPENPMD=ON -DWarpX_QED=OFF -DWarpX_COMPUTE=NOACC
# Same result without -DWarpX_QED=OFF
# execution
mpirun -np 4 ~/warpx/build/bin/warpx inputs > output.txt

The simulation runs until the end and crashes at the finalize step with error message

libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: [Series] Detected illegal access to iteration that has been closed previously.
SIGABRT
See Backtrace.2.0 file for details
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 6.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Backtrace:

===== TinyProfilers ======
main()
WarpX::Evolve()
Diagnostics::FilterComputePackFlush()
FlushFormatOpenPMD::WriteToFile()
WarpXOpenPMDPlot::WriteOpenPMDFields()

I also ran in Debug mode. This reproducer then takes 30 min (instead of 1 min in production mode), but it does not provide any additional information. Sometimes, when changing the number of BTD snapshots or resolution, the problem disappears. However, in this input file, the simulation runs long enough for each snapshot fo be full (they all have the same size, and increasing the number of time steps from the automatically-computed 3391 to 4000 doesn't remove the issue).

The output of the CMake command is here. This could also be a problem with my configuration. I de-activated openMP, just in case this could cause issues, but I have the same problem with openMP. It would already be very useful if someone tried this reproducer.

ax3l commented 3 years ago

updated inputs file for WarpX version 21.07-66-gbf7150fa8: inputs.txt Can reproduce locally on Ubuntu as well.

I wonder if Detected illegal access to iteration that has been closed previously. truly is the only error? We fixed to not throw this (final) message in case a previous error occurred recently: https://github.com/openPMD/openPMD-api/pull/1018 Quickly re-compiled with cmake -S . -B build -DWarpX_DIMS=2 -DWarpX_OPENPMD=ON -DWarpX_QED=OFF -DWarpX_COMPUTE=NOACC -DWarpX_openpmd_branch=dev using the nearly released 0.14.0 openPMD-api release - not related to that issue.

Will dig a bit more, sorry for the tremendous delay.

ax3l commented 3 years ago

The first two lab frame snapshots are not flushed until we do the final close. Somehow the lab frame outputs are: 2, 4, 5, 9, ... (FilterComputePackFlushLastTimestep:) 0, 1, 2 - this "jumping" of labframe snapshots also happens with the plotfile output

RevathiJambunathan commented 3 years ago

[Listing things to check based on offline conversation : Axel and Reva]

ax3l commented 3 years ago

I think that problem comes from BTDiagnostics::PrepareFieldDataForOutput().

Then "underful" buffers are dumped after the evolve loop via FilterComputePackFlushLastTimestep.

ax3l commented 3 years ago

We could skip m_buffer_counter[i_buffer] of size 0 in the last timestep dump, which would:

Fix for this aspect in #2148