LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

Support naming checkpoint to load after SCR_Init but before SCR_Start_restart #475

Closed adammoody closed 2 years ago

adammoody commented 2 years ago

Some applications would prefer to call SCR_Init before they have identified the name of the checkpoint that SCR should load. Those applications cannot set SCR_CURRENT before calling SCR_Init.

Rather than executing the fetch during SCR_Init, we could delay it until SCR_Have_restart. This would enable users to specify the checkpoint name after SCR_Init but still before SCR_Have_restart. SCR_Init could still detect, rebuild, and optionally flush any cached checkpoints.

For naming the checkpoint, there are several options to look at:

Related issues: