elcorto / psweep

Loop like a pro, make parameter studies fun.
https://elcorto.github.io/psweep
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Checkpoint/backup in case of crash #1

Closed tuanpham96 closed 4 years ago

tuanpham96 commented 5 years ago

This is a great package! I was just wondering if there is any way in the package that supports checkpoints. Sometimes, there could be a crash/error running at a certain parameter set. It would be nice if there is a checkpoint file so that re-running the parameter sweeping would not re-run the already-run sweep.

elcorto commented 4 years ago

Hi, thanks for your interest in the project.

I'd see this as a 2-step post-processing task. You could use a bool field named "crashed" to mark crashed runs. There are two ways to generate that.

I'd use the latter since here you have full control over post-processing you data (i.e. the criterion for what counts as crashed or how to detect a crashed state might change). You can re-run post-processing and setting the "crashed" flag as often as is needed.

Then have a second driver script (see this part of the README), that filters only crashed=True runs from the old database and re-runs those. Repeat that until no crashed runs remain.

See also this run script and this data analysis script for worked out examples (adding a column, parse output, update field values).

In your final database, you will have all crashed runs and the final good run for a specific pset (different _run_ids). In a final data evaluation script, simply use all crashed=False runs only.