christopherwharrop / rocoto

Rocoto Workflow Management System
Apache License 2.0
21 stars 16 forks source link

rocoto <complete> dependency #24

Open samtrahan opened 6 years ago

samtrahan commented 6 years ago

This requests the ecFlow "complete" directive be added to Rocoto. It would look something like this:

<task ...>
  ...
  <complete>
    <or>
      <taskdep>...</taskdep>
      <datadep>...</datadep>
    </or>
  </complete>
</task>

If the <complete> directive is met, the job is considered SUCCEEDED with the job id set to "completed." This would be inserted in the Rocoto implementation just before submit_new_jobs is called.

This is a cleaner feature than the "final='T'" because it eliminates the problematic aspects of "drained" cycles.

christopherwharrop commented 6 years ago

What is the use case for this feature?

samtrahan commented 6 years ago

Two examples come to mind:

  1. HWRF has multiple types of forecasts. Which one runs is not known until part of the way through the workflow.
  2. The FV3 GFS workflow skips some jobs depending on configuration for that cycle. The 00z, 06z, 12z, and 18z sometimes vary, and the first two cycles are special.

Such things can be dealt with by using final="T" tasks but it leads to incredibly complex code. This is especially problematic when generating a workflow automatically. Look at this file on Jet for an example:

/lfs3/projects/hfv3gfs/glopara/noscrub/expdir/fv3q2fy19retro5_dell_restart/workflow.xml

It is generated from this file:

/lfs3/projects/hfv3gfs/glopara/noscrub/expdir/fv3q2fy19retro5_dell_restart/workflow.yaml

christopherwharrop commented 6 years ago

Ok. So, for example, if a workflow will run either A or B, but you don't know which one's dependencies will be satisfied, you'll add a <complete> dependency in both A and B such that the one that isn't run is marked as complete if the one that is run is completed?

samtrahan commented 6 years ago

Chris,

Depending on the specifics of the workflow, A and B may not be able to have direct dependencies on one another. I would have to see an example. A simpler example is if A and B depend on a prior event:

A <depends> on file X existing when job C is done and is <complete> if file X does not exist when C is done
B <depends> on file X NOT existing when job C is done and is <complete> if file X does exist when C is done

You can see examples of this in the HWRF workflow where there are two branches of the data assimilation parts of the workflow depending on whether TDR data is available.

Compared to final="T", a <complete> directive is a far more direct method, that is easier to understand when reading the XML, easier to apply, and avoids the complexities that result from final="T"

christopherwharrop commented 6 years ago

Ok. You've convinced me.

I can see the utility of this in providing a way to have branching in a workflow (something that DAGs do a very bad job of representing). I agree that the "final" task attribute introduces some issues with booting and rewind. But, I see these two things as having distinct purposes. The "final" is really a way to "complete" an entire cycle, whereas, <complete> is a way to "complete" individual tasks that are "done" because their counterpart is to be run instead.

samtrahan commented 6 years ago

Chris,

I like that explanation. It would make sense for the Rocoto documentation to have <complete> and 'final="T"` in the same section and start with that sentence about how one is for a cycle and the other is for a task.