ComputationalRadiationPhysics / picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
https://picongpu.readthedocs.io
Other
691 stars 217 forks source link

Erroneous checkpoint restart of radiation plug-in #3959

Open Anton-Le opened 2 years ago

Anton-Le commented 2 years ago

I have recently observed that a restart from a checkpoint appears to zero-out the amplitudes accumulated in the radiation simulations (or the restore is incorrect for the plug-in).

Specifically:

Over the weekend I have run a simulation (electron bunch impacting a uniform background) using 59e9b53605f9a5c1bf271eeb055bc74370a99052 with radiation module enabled, with checkpoint restarts at steps 36k, 48k, 60k. In the following figure you see, on the left, the squares of the amplitude vectors. CumulativeIntensity_detectorNr_45 At the given time steps one observe clear "cuts" in the spectrum, something I did not expect since evaluation of older data obtained using the same evaluation routines did not display such cuts.

If one integrates over the frequency the following, sawtooth-like, curve is obtained: IntegratedIntensityVsTimeStep_detectorNr_45

I'm guessing that the load of checkpointed radiation data is erroneous.

PrometheusPi commented 2 years ago

@Anton-Le Which version of PIConGPU are you using and what did you adjust in the code? A restart bug could have been introduced with the switch to the openPMD-api briefly before the 0.6.0 release. Could you also please past your stdout and stderr into this issue. The radiation plugin is quite verbose regarding restarts and tells you what it found or did not find as a restart file. Could you also list all files in your simOutput/checkpoints/directory.

Anton-Le commented 2 years ago

The PIC version is the commit noted in the opening post - the 0.6.0 release as it is in master (https://github.com/ComputationalRadiationPhysics/picongpu/commit/59e9b53605f9a5c1bf271eeb055bc74370a99052 )

No further changes to the code were made and, since I did not change the verbosity level of the radiation plug-in from its default settings there is nothing in stdout/stderr that looks out of the ordinary:

 new grid size (global|local|offset): {560,2304,552}|{280,144,276}|{0,1584,276}
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Field solver condition: c * dt <= 1.1285 ? (c * dt = 1)
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
   Estimates are based on DensityRatio to BASE_DENSITY of each species
   (see: density.param, speciesDefinition.param).
   It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species b: omega_p * dt <= 0.1 ? (omega_p * dt = 0.0438178)
PIConGPUVerbose PHYSICS(1) | species e1: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e2: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e3: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e4: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | macro particles per device: 111283200
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 211.69
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 7.09036e-17
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 2.12564e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 1.92837e-28
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 3.39165e-17
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 2.40398e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 80188.2
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 1.73313e-11
PIConGPUVerboseRadiation SIMULATION_STATE(2) | Radiation (b): restart finished
PIConGPUVerboseRadiation SIMULATION_STATE(2) | Radiation (e1): restart finished
initialization time:  1min  4sec 291msec = 64.291 sec

[.. further standard output of a normal rad-module iteration ..]

As for the contents of the folder, here you go: Bugreport_ContentsOfCheckpointFolder.txt

I have requeued the simulation, since I messed up the detector distribution in the first one once it is done I could check the continuation to see whether the problem is reproducible.

PrometheusPi commented 2 years ago

I can reproduce the problem using the current master PIConGPU 0.6.0. Based on the Bunch example case 5 (single particle case) I ran the 32.cfg with restart at iteration 2900.

grafik

The radiation energy evolution of both the openPMD and text based output agree and both show the loss of data after the restart. Without a restart, the expected energy evolution (without a drop to zero) is observed.

This is a (now confirmed) bug of the restart capability of the radiation plugin. A back-port will be needed as soon as a fix is out.

PrometheusPi commented 2 years ago

The checkpoint is valid and contains all needed data. (Todo for myself: check z-amplitude with different case) EDIT: z-polarization is stored fine in the openPMD output The error occurs most likely while reading the checkpoint.

PrometheusPi commented 2 years ago

@Anton-Le Please see the pull request above. There I explain how you can still use your data despite PIConGPU not finding your restart files.