cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
329 stars 93 forks source link

remove: extend beyond the task pool #5643

Open oliver-sanders opened 1 year ago

oliver-sanders commented 1 year ago

Implement the cylc remove proposal:

MetRonnie commented 3 months ago
# task_states table
name        cycle       flow_nums   time_created    ... status  
----------  ----------  ----------  ---------------     ----------
a           1           [1]         <10 mins ago>   ... failed
a           1           [1, 2]      <2 mins ago>    ... succeeded   <---

How should we handle the case where the user removes task 1/a from flow 2 (a.k.a. remove flow 2 from task 1/a)?

Bearing in mind PRIMARY KEY(name, cycle, flow_nums), we have 2 choices:

  1. Delete row 1 and set flow_nums to [1] in row 2
  2. Delete row 2
oliver-sanders commented 3 months ago

If a task is removed from the pool as completed, then subsequently re-triggered by the user, then the most recent entry will override the older one.

This is the same case by a different mechanism, though somewhat awkward. The job state info will be preserved (tied to the submit number).

MetRonnie commented 3 months ago

So option 1

oliver-sanders commented 3 months ago

The behaviour for re-triggering might be closer to merge than replace (can't remember what else is in the table), otherwise yes.

Note that this will probably happen automatically after the new task has been added to the pool?

MetRonnie commented 3 months ago

The full schema for task_states looks like

name   cycle  flow_nums  time_created   time_updated   submit_num   status   flow_wait   is_manual_submit
------ ------ ---------- -------------- -------------- ------------ -------- ----------- ----------------
TEXT   TEXT   TEXT       TEXT           TEXT           INTEGER      TEXT     INTEGER     INTEGER

Note that this will probably happen automatically after the new task has been added to the pool?

Not sure what you mean; I am talking about removing a flow from a historical task

oliver-sanders commented 3 months ago

(1) is good

MetRonnie commented 2 months ago
# task_states table
name        cycle       flow_nums   time_created    ... status  
----------  ----------  ----------  ---------------     ----------
a           1           [1]         <10 mins ago>   ... failed
a           1           [1, 2]      <2 mins ago>    ... succeeded

How should we handle cylc remove 1/a (which is the same as cylc remove 1/a --flow all if I have read the proposal correctly)?

Is it: delete all rows WHERE name = "a" AND cycle = "1", except the last such row, which we update so that flow_nums = "[]"?

oliver-sanders commented 2 months ago

Yes, option (1) is good.

MetRonnie commented 1 month ago
  1. When outputs are removed, any corresponding prerequisites on downstream tasks in the pool shall be unset providing they were "naturally satisfied".

    This means prerequisites which have been satisfied by cylc trigger or cylc set will not be unset as the result of cylc remove as these have been manually satisfied by separate interventions.

Actually thinking about this, are we sure this is what we want?

If you accidentally cylc set/trigger the wrong task surely you would want cylc remove to undo any prereqs it satisfied?

There is also the snag that whether the prereqs were satisfied by cylc set/trigger is not recorded permanently, as the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

oliver-sanders commented 1 month ago

Actually thinking about this, are we sure this is what we want?

Yes.

the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

Does this get stored on the task_outputs table?

MetRonnie commented 1 month ago

Actually thinking about this, are we sure this is what we want?

Yes.

Why? Could you elaborate in response to my question? 👇

If you accidentally cylc set/trigger the wrong task surely you would want cylc remove to undo any prereqs it satisfied?


the task_prerequisites table in the DB only keeps track of what is in the task pool at any given moment.

Does this get stored on the task_outputs table?

I've tested this, cylc set --output are recorded in task_ouputs as e.g.

cycle       name        flow_nums   outputs                                                                   
----------  ----------  ----------  --------------------------------------------------------------------------
2           foo         [1]         {"submitted": "(manually completed)", "started": "(manually completed)"}

However, there is no record of cylc set --prerequisite or cylc trigger

oliver-sanders commented 1 month ago

Why? Could you elaborate in response to my question?

We have demoted a task and all of the outputs it generated from their flow. As a result these outputs no longer logically exist in the flow they once belonged to (because they have been removed) so should not affect the evolution of that flow any more. Any downstream prerequisites that this task satisfied should be wiped as a result.

If downstream tasks have already run, then it is too late to clean up via this method and the user will need to remove these downstream tasks too.

If you're unsure, lets take this offline.

MetRonnie commented 4 weeks ago

Summary from offline discussion:

... prerequisites which have been satisfied by cylc trigger...

This is not relevant; cylc trigger doesn't set prereqs (at least not anymore)

...or cylc set...

This only applies to cylc set --prerequisite, not cylc set --output

hjoliver commented 4 weeks ago

This is not relevant; cylc trigger doesn't set prereqs (at least not anymore)

Warning - it's going to work by setting prerequisites soon - see the group trigger proposal. (Although I'm not sure that's relevant - I need to get back up to speed on this issue).