ComputationalRadiationPhysics / picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
https://picongpu.readthedocs.io
Other
696 stars 218 forks source link

hemera v100: openPMD-api output to hdf5 - simulation hangs #4310

Open PrometheusPi opened 2 years ago

PrometheusPi commented 2 years ago

Running the default Laser Wakefield example on hemera V100 GPUs using the h5 backend of openPMD-api instead of the bp backend, leads to a hanging simulation. I could run the simulation with bp without any problems. Switching to h5 resulted in a hanging simulation right after init.
The first h5 output file was written but never closed.

zwjlpi commented 3 months ago

I met this issue too, and the h5 output works fine only when the code was run in one gpu. The parallel output for hdf5 seems incorrct. I still can't find the solution.

psychocoderHPC commented 3 months ago

Often the problem is coming from broken chunking in HDF5.

This could be a solution: https://github.com/ComputationalRadiationPhysics/picongpu/issues/4845#issuecomment-2009408453