Open elbaraim opened 3 years ago
And -- as a remark -- this is not an isolated case :(
EDIT: Maybe the group of persons involved in the storage development can have a look?
Looks like something @PaulJonasJost could be up to?
I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm
decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.
A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5
results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5
results file is generated. It would be also good to have a periodic write-out of these updated .h5
files.
I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the
tqdm
decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.A related issue (or the same) is that in my multi-start optimizations I periodically need to save my
.h5
results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no.h5
results file is generated. It would be also good to have a periodic write-out of these updated.h5
files.
It should already be possible to store intermediate results for optimization using the objective history.
trace_save_iter
from class pypesto.HistoryOptions
? (I think this is wrong, but maybe isn't.)
trace_save_iter
from classpypesto.HistoryOptions
? (I think this is wrong, but maybe isn't.)
That attribute controls how frequently results are stored, but it needs to activated in the first place.
@FFroehlich okay -> Concerning my related problem, I presume saving a results.h5
file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update my results.h5
file) isn't available, right? I hope I'm not getting this wrong.
I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the
tqdm
decorator (by adding a parameter for the write out interval) or implement another dedicated decorator. A related issue (or the same) is that in my multi-start optimizations I periodically need to save my.h5
results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no.h5
results file is generated. It would be also good to have a periodic write-out of these updated.h5
files.It should already be possible to store intermediate results for optimization using the objective history.
Yes, for optimization all should be possible already via the history class an optional trace_save_iter
. Essentially, for optimization, we are only interested in single optimal values, which can easily be managed and extracted from that history object (except if the optimizer also evaluates points violating constraints). For sampling, this is different.
@FFroehlich okay -> Concerning my related problem, I presume saving a
results.h5
file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update myresults.h5
file) isn't available, right? I hope I'm not getting this wrong.
Correct, see #517.
@yannikschaelte so the following code should update my results.csv
or results.h5
after the completion of each optimization run?
history_name = f"results_{date.today()}.csv" # or .h5
history_options = pypesto.HistoryOptions(trace_record=True, trace_save_iter=1, storage_file=history_name)
As far as I know and tested, with the hdf5 history that should automatically happen, the only thing I would not be sure there is whether the interrupted run is saved nicely. But this is only for optimization, not sure how this is with sampling, will have a look.
Hello @PaulJonasJost, yes the interrupted run might be in a "dirty" state, so the file isn't readable afterwards, which is okay (tested).
My concern is now the following: When I specify a .h5
file history (see above post) by adding the suffix .h5
, then the output folder needs to exist - which was created automatically if one uses the CSV history by adding the suffix .csv
. I assumed it would be handled the same way as when using the CSV history. Of course I can easily remedy this issue by creating the folders manually.
I am not sure which behaviour is expected, but I guess consistency across the both history writers might be desired?
it does not create the directory? that is weird as I should do that automatically... (in pypesto.optimize.util line 36-41)
Yes, it won't create the folder, I can share a screencast if required.
Feature description Allow pyPESTO to store intermediate results before the whole process is finished (e.g. optimization, sampling).
Motivation/Application This is important specially when working with more computationally demanding models, e.g. one may one to assess parameter uncertainty using a large number of samples, and due to time constraints (e.g. running on a server) the process can get killed at almost its finishing point and therefore losing all the samples generated in the meantime.
e.g. recently i got this painful message
of a process that took 7 days (and now all is lost) :(
This occurred in the context of sampling.