ECP-VeloC / VELOC

Very-Low Overhead Checkpointing System
http://veloc.rtfd.io
MIT License
52 stars 21 forks source link

SCR-to-VELOC differences #4

Open adammoody opened 5 years ago

adammoody commented 5 years ago

The VELOC API is missing some semantics needed for SCR. Most of these can be worked around, but I'll build a list to record where we stand:

1) No support for non-checkpoint output sets, e.g., SCR_Start_output. VELOC assumes each output set is a checkpoint. 2) No ability for app to ask when to checkpoint, i.e., SCR_Need_checkpoint 3) No ability for app to ask whether it should exit, i.e., SCR_Should_exit 4) Route_file also renames file whereas SCR keeps the same file name and only changes the path 5) Because veloc does not return checkpoint name to application, app must track a name-to-id map in an external file, so this map may become out of sync with checkpoints that are actually available

adammoody commented 5 years ago

Looks like we had and lost some of these functions in the change over to master: https://github.com/ECP-VeloC/VELOC/blob/275567e14f67abf258585eb94ebf63b96745a314/src/veloc.h#L1

bnicolae commented 4 years ago

I've added preliminary support for the SCR wrapper. Specifically, VELOC now flushes the original file names to stable storage (but still keeps its own names for local storage). Also, the restart works as long as the user specifies the original file name that goes with a specific checkpoint name and version. The user does not need to remember this, we could provide it automatically. However, the semantic of route_file on restart is not well defined for now (should the original file name be NULL on restart and get ignored?)

bnicolae commented 1 year ago

This issue stayed inactive for a long time. However, it is still relevant and will be revisited eventually.