Closed cnpetra closed 1 month ago
I thought about it some more over lunch, and I think load_state_from_data_store()
and save_state_to_data_store()
should take a ::axom::sidre::Group &
as it's argument rather than a ::axom::sidre::DataStore &
. The Group
argument should specify the group where the save/load should take place from rather than it being a fixed location relative to the root DataStore
.
You can get the DataStore
from the Group
in case there is a need for that.
IMHO, this is an improvement because if HiOp is part of a complex program, the outer program may want to encapsulate the HiOp part of the checkpoint in some group that's potentially several levels down from the DataStore root group. It also leaves the door open for their to be multiple instances of HiOp active in the running program that all need to be checkpointed. The current design can only support checkpointing of one HiOp instance in a given DataStore.
I thought about it some more over lunch, and I think
load_state_from_data_store()
andsave_state_to_data_store()
should take a::axom::sidre::Group &
as it's argument rather than a::axom::sidre::DataStore &
. TheGroup
argument should specify the group where the save/load should take place from rather than it being a fixed location relative to the rootDataStore
.You can get the
DataStore
from theGroup
in case there is a need for that.IMHO, this is an improvement because if HiOp is part of a complex program, the outer program may want to encapsulate the HiOp part of the checkpoint in some group that's potentially several levels down from the DataStore root group. It also leaves the door open for their to be multiple instances of HiOp active in the running program that all need to be checkpointed. The current design can only support checkpointing of one HiOp instance in a given DataStore.
I take it that load_state_from_data_store()
and save_state_to_data_store()
"APIs" would do the job on your end. I see the flexibility of working with the sidre::group
and will make the change. Will have you as a reviewer to have a final look when the PR ready.
I thought about it some more over lunch, and I think
load_state_from_data_store()
andsave_state_to_data_store()
should take a::axom::sidre::Group &
as it's argument rather than a::axom::sidre::DataStore &
. TheGroup
argument should specify the group where the save/load should take place from rather than it being a fixed location relative to the rootDataStore
. You can get theDataStore
from theGroup
in case there is a need for that. IMHO, this is an improvement because if HiOp is part of a complex program, the outer program may want to encapsulate the HiOp part of the checkpoint in some group that's potentially several levels down from the DataStore root group. It also leaves the door open for their to be multiple instances of HiOp active in the running program that all need to be checkpointed. The current design can only support checkpointing of one HiOp instance in a given DataStore.I take it that
load_state_from_data_store()
andsave_state_to_data_store()
"APIs" would do the job on your end. I see the flexibility of working with thesidre::group
and will make the change. Will have you as a reviewer to have a final look when the PR ready.
The load/save methods offer the most flexibility for LiDO and its clients. I haven't double checked that we have the right HiOp object handle to call them.
I thought about it some more over lunch, and I think
load_state_from_data_store()
andsave_state_to_data_store()
should take a::axom::sidre::Group &
as it's argument rather than a::axom::sidre::DataStore &
. TheGroup
argument should specify the group where the save/load should take place from rather than it being a fixed location relative to the rootDataStore
. You can get theDataStore
from theGroup
in case there is a need for that. IMHO, this is an improvement because if HiOp is part of a complex program, the outer program may want to encapsulate the HiOp part of the checkpoint in some group that's potentially several levels down from the DataStore root group. It also leaves the door open for their to be multiple instances of HiOp active in the running program that all need to be checkpointed. The current design can only support checkpointing of one HiOp instance in a given DataStore.I take it that
load_state_from_data_store()
andsave_state_to_data_store()
"APIs" would do the job on your end. I see the flexibility of working with thesidre::group
and will make the change. Will have you as a reviewer to have a final look when the PR ready.The load/save methods offer the most flexibility for LiDO and its clients. I haven't double checked that we have the right HiOp object handle to call them.
LiDO for sure instantiates the algorithm class somewhere in there. That "algorithm" object should be used. For example, attach it to the interface. Will provide an example in this PR.
I'll try to look at it early this week. It's PA season.
On Sun, Sep 22, 2024, 5:14 PM Cosmin G Petra @.***> wrote:
@cnpetra https://github.com/cnpetra requested your review on: #693 https://github.com/LLNL/hiop/pull/693 Checkpointing (quasi-Newton solver).
— Reply to this email directly, view it on GitHub https://github.com/LLNL/hiop/pull/693#event-14355934431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACM3KAWWTXVWUBXXPWCVWLZX5MNFAVCNFSM6AAAAABNI3FQ5WVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGM2TKOJTGQ2DGMI . You are receiving this because your review was requested.Message ID: @.***>
Adds API for checkpointing and support for restart via user options.
closes #686