adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

Checkpoint source level #178

Closed evcano closed 4 months ago

evcano commented 10 months ago

Hi @bch0w,

This PR contains changes to checkpoint a workflow at an event/source level.

The way I see it, a workflow consists of X tasks which are executed in parallel by N events. These tasks consist of Y functions, meaning that each of the N events needs to complete X * Y functions during one iteration of the workflow.

Said this, I create one file for each of the N events that tracks the status of all the X*Y functions that the event needs to complete. Once an event will conduct a function, the code checks the state file. If the function has already been completed by that event, the function is skipped. Otherwise, the function is conducted, and it is marked as completed in the state file of that event. In this way, if a task fails, the events will only conduct the functions that they didn't before. For instance, this avoid repeating all the forward simulations if only one fails.

Thus far I added the source-level checkpoint to the forward workflow but it is easy to do to the other ones. Feel free to comment on this approach and I can do the corresponding changes.

I hope the explanation and code is easy to understand and sorry for the delay. I will be more active again :) Eduardo.

bch0w commented 4 months ago

Hi @evcano, sorry that I never got around to this PR! The devel branch has moved quite a way past this PR version so I wonder if you are okay with closing this?

I really like your idea however and I think we can try to incorporate it into a future source level checkpointing system as I think it is still a really useful feature to have!

evcano commented 4 months ago

Hi @bch0w, no problem!, I will close this PR.