Closed oliver-sanders closed 2 years ago
@hjoliver it is not clear what you are proposing, please could you fill out the above examples with your desired behaviour and highlight where they differ.
You seem to be suggesting the rules for what flow numbers are provided by --flow=all
differ depending on whether the task has run before or not in contradiction with:
Agreed. And n=0 flow numbers should do for --flow=all
You seem to be suggesting the rules for what flow numbers are provided by --flow=all differ depending on whether the task has run before or not in contradiction with:
Agreed. And n=0 flow numbers should do for --flow=all
Not really, I'm saying current active flows (i.e. those in n=0
) should be sufficient, c.f. all flows recorded in the DB.
With the small caveat (which is probably what caused the confusion here, sorry) that we should exclude flow numbers of flows that have already passed through the triggered task. That is what allows the default trigger to re-run a sub-graph (say) behind a flow (because the triggered task will not take the flow number of the flow that we are re-running, even if that flow number still exists in n=0
).
please could you fill out the above examples with your desired behaviour and highlight where they differ.
OK, I'll try to do that now, since we desperately need to lay this one to rest. I wonder if this is gonna end up the longest single issue page on the project :-)
suggesting the rules for what flow numbers are provided by --flow=all differ depending on whether the task has run before or not
Also, I'd say the rules are the exactly same in both cases, it's just that in the never-ran-before case there is no previous flow number to exclude.
we should exclude flow numbers of flows that have already passed through the triggered task
So if there is only one flow in the workflow the task will not run at all.
If there are multiple flows in the workflow the "continue" trigger will result in a reflow irrespective of whether the other flow(s) are ahead or behind of the original?
Examples would be great.
So if there is only one flow in the workflow the task will not run at all.
No, see this comment:
and a new flow number (in case there are no existing flows that have not used the task already)
Ok, so this effectively changes to default to reflow for historical tasks.
I would much prefer for reflows to require users to opt-in in all cases because the consequences of reflow on users data are quite dangerous and reflow (and multiple flows in general) are way beyond what we can expect of the working knowledge of the vast majority of users.
If there are multiple flows in the workflow the "continue" trigger will result in a reflow irrespective of whether the other flow(s) are ahead or behind of the original?
(See my terminology comments above on what exactly "reflow" means)
So I think "the continue trigger" should, by definition, "continue", which means a flow should carry on from the triggered task.
The main thing, which we agreed on, is that by default that continuing flow should not get overrun by any existing flows (and I'm not arguing with that).
Ok, so this effectively changes to default to reflow for historical tasks.
Meh, sort of. My way is simpler from a consistency perspective (same behaviour on triggering a task, whether or not it ever ran before), and I think what matters and is easier to understand is whether the triggered task flows on or not. The fact that flowing on after triggering an n>0
task is not technically a "reflow" will be lost on most users. It will look like a new flow to them (now we have the original flow, and this new one from where I triggered a task) ... the fact that it happens to have the right flow numbers so that the original flow won't overrun it on catch-up, or that it is "not a reflow" because those tasks never ran before, is secondary.
And my other related point is that if you are triggering a past task to re-run it, you are just as likely to want it to flow on (the regenerate some products use case), as opposed to running a single task.
The re-run a single task case seems to me to be best expressed by non-default --flow=none
option. For two reasons: 1) you want to trigger a single task, not a flow; and 2) my "flow integrity" argument above: a flow is a self-perpetuating run through the graph, and the previous flow already passed by ... so why should the re-triggered task have the same flow number?
I would much prefer for reflows to require users to opt-in in all cases because the consequences of reflow on users data are quite dangerous and reflow (and multiple flows in general) are way beyond what we can expect of the working knowledge of the vast majority of users.
I don't disagree that "reflow is dangerous" in the sense that it re-runs tasks and that will probably overwrite existing data. However:
cylc insert
ing multiple waiting tasks in the right order.At least I think we probably both understand where the other is coming from now.
Because I was focused more on consistent triggering behaviour, when you agreed to go back to the no-wait default I thought that applied equally to future and past tasks. i.e. no-wait in front of flow=1 means "flow on now" (with all current flow numbers that could catch up and merge); and no-wait behind flow=1 means exactly the same thing.
Both generate a new flow front. The fact that one case involves re-running past tasks should be blindingly obvious to users because they deliberately triggered a task that already ran.
If you're not coming around to my perspective (which again, makes for simpler, consistent triggering behaviour and does not treat flow=1 as magic [SPECIAL]) then I suppose one way out of this bind is to revert to "wait" as a default. I'd rather not do that because a) it artificially constrains the workflow; and b) if it behaves as you want for re-running tasks, it makes the "wait" concept harder to understand (easy: wait for existing flows to catch up before continuing; weird: if only flow=1 exists and we trigger behind it, what are we "waiting" for??)
n>0
)(SAME RESULT in all cases)
n<0
)SAME RESULT (A new flow is started which overruns the previous flow.)
DIFFERENT RESULT: same as 1) Reflow
The task "a" will get re-run by the trigger, and the graph WILL run on from there (that's what "continue" and "no wait" means)
SAME RESULT
DIFFERENT RESULT: still same as type (2), but now that is the same as reflow rather than no-flow
If you're not coming around to my perspective (which again, makes for simpler, consistent triggering behaviour and does not treat flow=1 as magic)
Disagree on "simpler", "consistent" and "magic" 😁.
You're not winning me over I'm afraid. I see your points, but I don't agree with them. Since the start I've maintained that defaulting to reflow is dangerous and that all reflow functionality (and all its complex consequences e.g. no-flow) should be opt-in.
You are proposing that --flow=all
can actually mean, all flows OR all flows and a new one minus an existing one OR a just new flow, which isn't especially consistent.
If I understand correctly what you are proposing does not add any new functionality, it just changes the default. If so my interpretation covers all bases, but if you want a reflow you must manually say so.
You are proposing that --flow=all can actually mean, all flows OR all flows and a new one minus an existing one
That's kind of a misrepresentation because it ignores the definition of flow. A flow is a self-consistent self-perpetuating run through the graph. If a flow has passed by a task, retriggering it should be considered a new flow (or a one-off no-flow), because by definition that task has already run in that flow. You are saying, give the task the same flow number it had before but run it anyway, even though it has already run in that flow.
OR a just new flow, which isn't especially consistent.
My consistency is at the conceptual level. When you trigger a task, any task, does it flow on or not. This supposed inconsistency is down at the level of flow numbers which is really an implementation detail that we use to make the required behaviours work.
If I understand correctly what you are proposing does not add any new functionality, it just changes the default. If so my interpretation covers all bases, but if you want a reflow you must manually say so.
That's right, but we are coming from two different flow models (in a sense). By my conceptual model (which I'm claiming is simpler) your default is different behind the first flow than it is in front of it. (And it doesn't even seem to make sense with respect to the names that you gave the options: behind flow=1 the "continue" / no-wait default does not actually continue anything.)
I don't think we are going to get anywhere with this, suggest another call.
(otherwise it's going to be another ten pages of reply, quote and response)
Yep, can do :+1:
OK, meeting done. Result: I concede defeat. :boom: Reasons, for the record:
flow=1
special (i.e. once flow=1
passes by, a task instantly becomes "historical" and stays that way) ... BUT we will only consider active flows at trigger time Also, on terminology:
n=0
... (maybe "collision" is a good term for that).--wait
(or --no-wait
) as an option name or concept because when re-triggering behind a flow you would almost never want to wait for an upcoming flow to merge and then continue. However:
--wait
)The final result then, for implementation.
(@oliver-sanders' explicit examples above are all valid and useful, and should be made into tests, but I think we can ditch the four-way categorization at this point).
cylc trigger [--wait]
The triggered task runs with the set of all active (n=0
) flow numbers, A
--wait
: flow on if/when members of A catch up and merge with it --wait
is meaningless)(It gets a bit gnarly to list exactly what happens when triggering ahead of all flows, behind all flows, and between flows ... but we don't need to do that here as it's all derivable from the above).
cylc trigger --flow=1,2 [--wait]
The triggered task runs with the specified set of flow numbers, S = {1,2}
cylc trigger --flow=new
The triggered task runs with a new flow number, not in the set of active flows A (or any previous flow in fact).
--wait
is meaningless)cylc trigger --flow=none
The triggered task runs with a "none" flow number.
--wait
is meaningless)
After a long chat with @dpmatthews (who proposed yet another triggering approach 😁) I think we can generalise the trigger problem into two dimensions:
Combing these we get four spaces:
Going through the four spaces in detail:
1) Reflow (implemented)
Equivalent to
cylc trigger --flow=<new-flow-number>
.Continue: Yes Overrun: Yes
The use case is for re-running over tasks which have been previously run e.g. change configuration and re-run a sub-graph.
2) Continue (proposed)
Equivalent to
cylc trigger --flow=<all-flow-numbers>,<new-flow-number>
.Continue: Yes Overrun: No
--flow=1
to be used for, but has been generalised to be reflow compatible.This approach feels quite "natural". The use cases are setting off another bit of the same flow where you don't want tasks to be overrun.
3) No Flow (implemented)
Equivalent to
cylc trigger --flow -1
.Continue: No Overrun: Yes
Useful for running one-off tasks that you do not want to impact the workflow in any way (i.e.
cylc submit
type uses).4) No Flow (proposed)
Equivalent to
cylc trigger --flow -2
.Continue: No Overrun: No
Use case is for manually intervening in graph execution by ignoring dependencies or runahead limit and skipping ahead to a task which you want to be considered a part of the approaching flow front.
Interface
The internals to handle the four cases are already in-place, flow_nums, DB lookups etc, so it mostly boils down to an interface / documentation issue.
I think all four methods could be exposed via a single
--flow
argument, however, it is sensible to provide defaults for the different behaviours. I think it would be good to document the--flow
equivalents as they may help users to understand their function.1) Enable behaviours explicitly
If we are happy with the continue/overrun model (after workshopping the terms) we could expose it directly something like:
This is quite nice as you have to explicitly opt in to each behaviour separately reducing the scope for unintended results and accidents.
2) Single
--flow
argumentif we don't like the continue/overrun model we could move the presets into the flow argument something like:
It's less behaviour driven so we would need to explain each option separately.
3) Separate flag for each approach
An alternative to (2) would be to could come up with three/four different flags:
Default
I think no-continue & no-overrun is the safest, sanest default because:
But I'm biased. I think the default is less important than the clear separation of behaviours.