HenrikBengtsson / CBI-software

A Scientific Software Stack for HPC (CentOS oriented)
https://wynton.ucsf.edu/hpc/software/software-repositories.html
5 stars 2 forks source link

SOFTWARE: DMTCP - checkpointing processes #32

Closed HenrikBengtsson closed 2 years ago

HenrikBengtsson commented 2 years ago

DMTCP is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

A demo using Singularity 2.6 and DMTCP: https://github.com/mmore500/mwe-singularity-checkpoint

Actions

HenrikBengtsson commented 2 years ago

See https://docs.nersc.gov/development/checkpoint-restart/dmtcp/ for examples how to use DMTCP.

HenrikBengtsson commented 2 years ago

http://wiki.orc.gmu.edu/mkdocs/Creating_Checkpoints_%28DMTCP%29/

https://wiki.york.ac.uk/display/RCS/VK21%29+Checkpointing+with+DMTCP