Closed mhovd closed 1 year ago
That is a good suggestion. I think the minimal required would simply be the current support points. From that a run could be restarted.
Michael Neely, MD, MSc, FCP | Chief, Division of Infectious Diseases
Director, Laboratory of Applied Pharmacokinetics and Bioinformatics (www.lapk.orghttp://www.lapk.org/), The Saban Research Institute, Children's Hospital Los Angeles
4650 Sunset. Blvd, #MS 176 | Los Angeles, CA 90027 Ph: 323.361.5047 | Fax: 323.361.1183 | @.**@.>
www.chla.orghttp://www.chla.org/
Professor and Clinical Scholar | Department of Pediatrics,
Keck School of Medicine | University of Southern California www.usc.eduhttp://www.usc.edu/
From: Markus @.> Sent: Wednesday, December 14, 2022 5:03:55 AM To: LAPKB/Pmetrics @.> Cc: Subscribed @.***> Subject: [LAPKB/Pmetrics] Checkpointing (Issue #128)
If the terminal or R process is closed for whatever reason during a run, there is currently no way to restart that run. As such, all the progress will be lost, and the run will have to be restarted.
Would it be possible to add some level of checkpointing, so that runs may be restored if they are exited unexpectedly? This would be a huge improvement to the user experience and stability, and would also be advantageous in terms of running on high-performance clusters, which will favor jobs that support checkpointing.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/LAPKB/Pmetrics/issues/128__;!!LIr3w8kk_Xxm!o2CrK5CBl1JwFOjUdKW3JTLwSQ5j7zCrWgTEZAnjTnWqzQHlG2NLNe3L3t45fTZo3zVbqorF4Nk4WwxNGb5k-2M$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6CLTSNX6VG5S7PV3ABTADWNHATXANCNFSM6AAAAAAS6OVDE4__;!!LIr3w8kk_Xxm!o2CrK5CBl1JwFOjUdKW3JTLwSQ5j7zCrWgTEZAnjTnWqzQHlG2NLNe3L3t45fTZo3zVbqorF4Nk4WwxNtkHWUGI$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
I have seen @masyamada stopping and continuing runs, I'm not sure if that is something that can be done only if you're starting the fortran process manually.
What you think Walter, can we use that to store the state of the run each cycle so we can continue if the process is closed or the power is off?
Maybe this should rather be implemented in the Rust-engine? Especially if it is as simle as starting from the support points of the previous working cycle.
If the terminal or R process is closed for whatever reason during a run, there is currently no way to restart that run. As such, all the progress will be lost, and the run will have to be restarted.
Would it be possible to add some level of checkpointing, so that runs may be restored if they are exited unexpectedly? This would be a huge improvement to the user experience and stability, and would also be advantageous in terms of running on high-performance clusters, which will favor jobs that support checkpointing.