LLNL / hiop

HPC solver for nonlinear optimization problems
Other
210 stars 42 forks source link

Add advanced checkpoint/restart capabilities to Hiop #686

Closed tepperly closed 1 month ago

tepperly commented 5 months ago

Many HPC applications need to implement a checkpoint/restart capability to address either:

To address these concerns, typical HPC applications periodically write their internal state to hard drive (checkpointing) and then have the ability to restart and resume progress from the last checkpoint file.

Hiop has some ability to warm start through the get_starting_point() or get_warm_start(). However, it would be better if Hiop could save more of its internal state to do a better restart.

The ::axom::sidre package provides a flexible checkpoint/restart API and implementation. Multiple LLNL packages use ::axom::sidre. The repository is here. It's buildable with Spack.