LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

define SCR_Start_input API #485

Open adammoody opened 2 years ago

adammoody commented 2 years ago

For applications that write output datasets SCR_FLAG_OUTPUT, it would be useful to define a start/complete interface for accessing those files for input. Input datasets could be checkpoints.

Similar to the restart interface, this might look something like:

SCR_Have_input(int* have_input, char* dset)
SCR_Start_input(char* dset)
SCR_Complete_input(int valid)

One could provide the name of the dataset that they intend to open. SCR should be able to report whether it has that dataset available. This could be done in the Have_input call. Those semantics are a bit different than Have_restart. If we adopt this new interface, we might deprecate Have_restart or redefine its behavior.

SCR would identify where the dataset exists, perhaps loading it if needed (which may include extraction, decompression, rebuild, etc). Between the start/complete calls, route_file would point to the proper location. The start/complete calls are needed since SCR may move/delete the dataset outside of those bookends.