ICB-DCM / pyPESTO

python Parameter EStimation TOolbox
https://pypesto.readthedocs.io
BSD 3-Clause "New" or "Revised" License
219 stars 47 forks source link

Storage of intermediate results #720

Open elbaraim opened 3 years ago

elbaraim commented 3 years ago

Feature description Allow pyPESTO to store intermediate results before the whole process is finished (e.g. optimization, sampling).

Motivation/Application This is important specially when working with more computationally demanding models, e.g. one may one to assess parameter uncertainty using a large number of samples, and due to time constraints (e.g. running on a server) the process can get killed at almost its finishing point and therefore losing all the samples generated in the meantime.

e.g. recently i got this painful message

99%|█████████▉| 992972/1000000 [167:59:43<49:23,  2.37it/s]slurmstepd: error: *** JOB 1338355 ON node43 CANCELLED AT 2021-08-19T11:59:07 DUE TO TIME LIMIT ***

of a process that took 7 days (and now all is lost) :(

This occurred in the context of sampling.

elbaraim commented 3 years ago

And -- as a remark -- this is not an isolated case :(

EDIT: Maybe the group of persons involved in the storage development can have a look?

jvanhoefer commented 3 years ago

Looks like something @PaulJonasJost could be up to?

stephanmg commented 3 years ago

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.

A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

FFroehlich commented 3 years ago

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.

A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

It should already be possible to store intermediate results for optimization using the objective history.

stephanmg commented 3 years ago

trace_save_iter from class pypesto.HistoryOptions? (I think this is wrong, but maybe isn't.)

FFroehlich commented 3 years ago

trace_save_iter from class pypesto.HistoryOptions? (I think this is wrong, but maybe isn't.)

That attribute controls how frequently results are stored, but it needs to activated in the first place.

stephanmg commented 3 years ago

@FFroehlich okay -> Concerning my related problem, I presume saving a results.h5 file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update my results.h5 file) isn't available, right? I hope I'm not getting this wrong.

yannikschaelte commented 3 years ago

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator. A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

It should already be possible to store intermediate results for optimization using the objective history.

Yes, for optimization all should be possible already via the history class an optional trace_save_iter. Essentially, for optimization, we are only interested in single optimal values, which can easily be managed and extracted from that history object (except if the optimizer also evaluates points violating constraints). For sampling, this is different.

FFroehlich commented 3 years ago

@FFroehlich okay -> Concerning my related problem, I presume saving a results.h5 file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update my results.h5 file) isn't available, right? I hope I'm not getting this wrong.

Correct, see #517.

stephanmg commented 3 years ago

@yannikschaelte so the following code should update my results.csv or results.h5 after the completion of each optimization run?

history_name = f"results_{date.today()}.csv" # or .h5
history_options = pypesto.HistoryOptions(trace_record=True, trace_save_iter=1, storage_file=history_name)
PaulJonasJost commented 3 years ago

As far as I know and tested, with the hdf5 history that should automatically happen, the only thing I would not be sure there is whether the interrupted run is saved nicely. But this is only for optimization, not sure how this is with sampling, will have a look.

stephanmg commented 3 years ago

Hello @PaulJonasJost, yes the interrupted run might be in a "dirty" state, so the file isn't readable afterwards, which is okay (tested).

My concern is now the following: When I specify a .h5 file history (see above post) by adding the suffix .h5, then the output folder needs to exist - which was created automatically if one uses the CSV history by adding the suffix .csv. I assumed it would be handled the same way as when using the CSV history. Of course I can easily remedy this issue by creating the folders manually.

I am not sure which behaviour is expected, but I guess consistency across the both history writers might be desired?

PaulJonasJost commented 3 years ago

it does not create the directory? that is weird as I should do that automatically... (in pypesto.optimize.util line 36-41)

stephanmg commented 3 years ago

Yes, it won't create the folder, I can share a screencast if required.