SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
Some applications would prefer to call SCR_Init before they have identified the name of the checkpoint that SCR should load. Those applications cannot set SCR_CURRENT before calling SCR_Init.
Rather than executing the fetch during SCR_Init, we could delay it until SCR_Have_restart. This would enable users to specify the checkpoint name after SCR_Init but still before SCR_Have_restart. SCR_Init could still detect, rebuild, and optionally flush any cached checkpoints.
For naming the checkpoint, there are several options to look at:
Stick with SCR_Config("SCR_CURRENT=foo"). In that case, setting SCR_CURRENT is sort of special cased, since basically all other settings must still be set before SCR_Init.
Use the SCR_Current(const char* name) call. This call is used in apps that restart from an SCR checkpoint but do so without using SCR's restart API. It provides a way for the application to tell SCR which checkpoint was loaded so that SCR can initialize its state to match. We could extend this API to allow a user to specify a checkpoint name before SCR_Have_restart.
Define the name parameter in SCR_Have_restart to be an input/output parameter. Right now, the name argument is output only. SCR fills it in with the name of the checkpoint that it loaded. We could change it to be an input/output parameter allowing the user to request a particular name as part of the SCR_Have_restart call. We'd have some backwards compatibility items to think about, though, since users don't currently set that parameter before calling SCR_Have_restart.
Some applications would prefer to call
SCR_Init
before they have identified the name of the checkpoint that SCR should load. Those applications cannot setSCR_CURRENT
before callingSCR_Init
.Rather than executing the fetch during
SCR_Init
, we could delay it untilSCR_Have_restart
. This would enable users to specify the checkpoint name afterSCR_Init
but still beforeSCR_Have_restart
.SCR_Init
could still detect, rebuild, and optionally flush any cached checkpoints.For naming the checkpoint, there are several options to look at:
SCR_Config("SCR_CURRENT=foo")
. In that case, settingSCR_CURRENT
is sort of special cased, since basically all other settings must still be set beforeSCR_Init
.SCR_Current(const char* name)
call. This call is used in apps that restart from an SCR checkpoint but do so without using SCR's restart API. It provides a way for the application to tell SCR which checkpoint was loaded so that SCR can initialize its state to match. We could extend this API to allow a user to specify a checkpoint name beforeSCR_Have_restart
.SCR_Have_restart
to be an input/output parameter. Right now, the name argument is output only. SCR fills it in with the name of the checkpoint that it loaded. We could change it to be an input/output parameter allowing the user to request a particular name as part of theSCR_Have_restart
call. We'd have some backwards compatibility items to think about, though, since users don't currently set that parameter before callingSCR_Have_restart
.Related issues: