Open TomekTrzeciak opened 1 year ago
For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.
We could add a new top-level workflow status for "completed" workflows. Currently this state can be effectively detected by querying the task-pool table in the database, if there are no entries, then the workflow has completed.
For sub-workflows, we can currently use the workflow's exit code which kinda works, however, with this it is hard to tell the difference between a stopped workflow and a completed workflow.
My sub-workflow example notes this, and addresses it by having the sub-workflow launch script (for the launcher task in the main workflow) check the DB for completion of a known final task in the sub-workflow:
# sub-workflow stopped, but did it succeed?
cylc workflow-state \
--max-polls=1 \
--task=${SUBWF_END_TASK#*/} \
--point=${SUBWF_END_TASK%/*} \
--status=succeeded \
$SUBWF_ID
However, your suggestion to use the task pool table is an improvement 🎉 I'll amend my example and alert the couple of NIWA teams with sub-workflow use-cases.
Also, a new top-level workflow status for "completed" is a good idea.
It would be a good idea to make accessing the "complete" status as easy as possible as this is something that tools like cylc scan
will need to do.
Ideally we wouldn't need to go to the database at all (managing database connections is hassle), perhaps a .service
file or field thereof?
Problem
Cylc has a clear concept of task and job states, but less so when it comes to the overall workflow state. For example, once the workflow has stopped, there is no easy way to tell the underlying reason without digging through the logs or database. In particular, for non-cycling workflows or ones with finite number of cycles it would be useful to easily tell apart normal termination (workflow reached and completed the final cycle) from abnormal one (stalled, server crash, ...). Chatting to @oliver-sanders about it, this seems to be also a prerequisite for having proper support for subworkflow as a task in the future (couldn't find a specific issue for it).
Proposed Solution
A possible solution could be to add a workflow-wide status file akin to
job.status
that can be scanned for and interrogated for information.