UCLA-Plasma-Simulation-Group / QuickPIC-OpenSource

Open source repository for QuickPIC
Other
74 stars 46 forks source link

Writing to multiple files when restarting from checkpoint #1

Closed vkbo closed 6 years ago

vkbo commented 7 years ago

I had to restart a simulation today due to a power interruption on the supercomputer I run QuickPIC on.

The simulation dumps data every 20 dT, and restart files every 5000. Restarting from 5001 today I noticed that when writing file 5001 QuickPIC also wrote to file 0001, then for 5021 it also wrote to 0021, etc. Doing a hash of the files reveals that content has changed. Still, it does not seem that the grid data itself has changed, at least not in the few data points I managed to check. So it is probably some meta-data being updated.

It is still a problem as it causes the file sync tools to re-download some 60GB of data in this case.

caozigao commented 7 years ago

Hi Veronica,

Could you show me how you set the parameters for the restart in your input deck? Thanks.

Best wishes,

Weiming An

On Jun 22, 2017, at 2:37 AM, Veronica Berglyd Olsen notifications@github.com wrote:

I had to restart a simulation today due to a power interruption on the supercomputer I run QuickPIC on.

The simulation dumps data every 20 dT, and restart files every 5000. Restarting from 5001 today I noticed that when writing file 5001 QuickPIC also wrote to file 0001, then for 5021 it also wrote to 0021, etc. Doing a hash of the files reveals that content has changed. Still, it does not seem that the grid data itself has changed, at least not in the few data points I managed to check. So it is probably some meta-data being updated.

It is still a problem as it causes the file sync tools to re-download some 60GB of data in this case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

vkbo commented 7 years ago

Hi Weiming,

I also noticed that some of the modified dump 5001 files were corrupt, so I just re-ran the whole simulation, but I believe the restart options I used were:

&Restart_File
 READ_RST_FILE = .true.,
 RST_TIMESTEP  = 5001,
 DUMP_RST_FILE = .true.,
 DFRST         = 5000,
/
caozigao commented 6 years ago

Hi Veronica,

I fixed the bug for simulation restart. Thank you very much for pointing that out.

Best wishes,

Weiming An