cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
325 stars 90 forks source link

Command task matching requirements #5827

Open hjoliver opened 8 months ago

hjoliver commented 8 months ago

Supersedes: #5763, #5695

See also: #5752, #5416, #5677

TODO:

target task population

Different commands need to select tasks from different populations:

command population
hold, set, trigger pool, future, past
release held-tasks list only #5752
kill, poll pool only
remove pool, past
show pool only (but maybe future, past too #5677)

task name wildcards

We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool.

For future and past tasks this means:

cycle point wildcards

We need to support matching cycle points by glob in the pool, e.g. to target all incomplete tasks.

We should not support cycle point wildcards outside of the pool (well, * is certainly bad; [5-8] (e.g.) may be OK in principle but still dangerous and not that much of a plus to users anyway).

task qualifiers

Use of some task qualifiers (i.e. :output) necessarily restricts matching to the pool. E.g. cylc set --out=succeeded "*:failed" should set all n=0 incomplete failed tasks to succeeded.

We can probably decide that use of any qualifier means pool only (is there any need to distinguish between e.g. completed-succeeded and completed-failed past tasks?)


Terminology:

hjoliver commented 1 month ago

Update to the above, regarding cycle point wildcards:

So that's another difference in task-matching requirements for different commands.

hjoliver commented 3 weeks ago

task name wildcards

We need to support matching task names by glob (or regex) and family name in all cases (not just in the pool. For future and past tasks this means:

  • find all matching task def names (or member tasks of matching family names)
  • check which are valid for the given cycle point(s)

Note we already have the machinery to determine if given tasks (by name) are valid at a particular cycle point.

There is still potentially an issue as to whether the task would actually end up running there automatically due to optional branching (or even manual interventions) upstream of it.

However, I don't think that really matters, because:

Finally, for task names, this "problem" (if it is one) applies equally to individual future tasks - which we already allow. For a family name or glob, we just end up with more tasks at the target point.