amendment to proposal amendment

oliver-sanders commented 6 months ago

Summary

Redefine task "incompletion" to avoid problem edge cases.
Permit pre-conditions in completion expressions (no longer any reason not to, this makes things clearer for users)
Remove redundant safety check that was only required for the previous proposal

Incompletion condition bugs

Closer inspection of the condition for task "incompletion" revealed some logical flaws, e.g:

1. Task cannot be completed by the expired output.

If a task is expired (and expiry is permitted) then it is complete, but subsequently setting a finished output would make it incomplete by satisfying the "it finished executing" condition.

Similarly, if the task succeeds, fails or submit-fails, then it will not be possible to subsequently complete it by setting the expired output.

2. Tasks with no required outputs

It is possible for a task to have no required outputs, e.g:

a:submitted?
a:succeeded?
a:expired?

In this case the completion condition is True. By the previous logic the task is complete before it has even been submitted.

Underlying issue

These flaws weren't intentional, they are just bugs. The issue here is the defining incompletion which results in a lot of double negatives.

For the same reason we cannot use not in completion conditions (task outputs accumulate over multiple executions / interventions), we cannot define task completion in terms of negatives.

The solution is to define task completion in terms of positives. This is also clearer and easier to implement.

Implementation detail: Exposing the completion condition

A bonus of the new definition of task completion is that we can define the default completion condition for a task in a single flat expression e.g:

( succeeded and x) or expired

(As opposed to the previous branched code logic)

This condition can then be exposed to the user clarifying how the optional outputs they define in the graph impact the completion of the task.

E.G:

a?
a:failed?

$ cylc config -i '[runtime][a]completion'
completion = succeeded or failed

Behaviour change

There has been one functional change in this amendment. Consider:

a?
a:failed?
a:x

Following the previous logic the task a could have been completed in one of two ways:

succeeded and x
failed and x

So the completion condition, effectively reduced to:

(succeeded and x) or (failed and x)

This isn't especially helpful as we probably wouldn't have expected x to be generated in the failure scenario.

I'm not sure whether this is a bug in the condition or purposeful. This amendment changes:

All required outputs must be completed if the task runs. The task must succeed unless stated otherwise.

To:

All required outputs must be completed if the task succeeds. The task must succeed unless stated otherwise.

Changing the default completion expression for the above example to:

(succeeded and x) or failed

Which is a tad more practical.

oliver-sanders commented 6 months ago

Playground:

I have an implementation of the proposed logic on this branch, it's not ready yet of course, but you can use it to see how the default completion condition is generated, e.g:

[scheduling]
    [[graph]]
        R1 = """
            a? & a:x
            a:failed?

            b? & b:x?
            b:failed?

            c? & c:x
            c:failed? & c:y

            d?
            d:submit?
            d:expire?
        """

[runtime]
    [[root]]
        [[[outputs]]]
            x = xxx
            y = yyy
            z = zzz

    [[a]]
    [[b]]
        # user defined completion expression
        # (this can be anything so long as it's consistent
        #  with the graph)
        completion = (succeeded and x) or (failed and y)
    [[c]]
    [[d]]

$ cylc config . -i '[runtime]'
[[root]]
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[a]]
    completion = (x and succeeded) or failed
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[b]]
    completion = (succeeded and x) or (failed and y)
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[c]]
    completion = ((x and y) and succeeded) or failed
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[d]]
    completion = succeeded or failed or submit_failed or expired
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz

hjoliver commented 6 months ago

@oliver-sanders - I'll try to distill my two "definitional" issues further. This is important because we all have to agree on what the terms mean, and document the system accordingly.

1. complete vs incomplete tasks

Task completion should depend ONLY on outputs, not on task state original comment

a task is complete if its output completion expression is satisfied
a task is incomplete if its output completion expression is not satisfied #<--- NEW (no final status involved)

Then, a task can be removed from the active task pool only if it is finished (i.e., it has a final task status) AND complete.

Note, without this, as proposed complete and incomplete tasks are not logical opposites - you can have a task that is both not complete and not incomplete. 🤯

2. satisfying vs completing outputs

Individual outputs should be "satisfied", not "completed" original comment

this would disambiguate our overloaded use of the term "complete"
it can be spun as making good sense too: a task message "satisfies" a task output

Note, I thought, from your wording that you were proposing this; I was just trying to take it seriously 😁. Either way, it is less important than 1., but worth considering.

oliver-sanders commented 6 months ago

I'm ok with complete and incomplete not being literal opposites, like flammable and inflammable, hence it's a grammar argument, happy to accept suggestions but I don't have any myself. Moreover, it's not inconsistent, one defines when outputs are complete, the other when tasks are complete, note, is_complete is internal only and not even user facing!

Happy to fiddle the meaning of completed and satisfied, but I don't think I'm going to be changing terminology universally in this proposal/PR as it's more far reaching than task_outputs.py.

Trying to avoid getting drawn into one of Cylc's legendary grammar/terminology arguments right now.

dpmatthews commented 6 months ago

We're going to need to support an event (and indicate it in the GUI) if a task can not complete so we need to be able to describe this clearly. I agree that incomplete would not be a good name for this event. Maybe wont_complete?

hjoliver commented 6 months ago

This isn't one of those "legendary" arguments (the one you referenced was a semi-humorous suggestion for a collective noun where one didn't exist). We have to have an agreed meaning of important terms like this or we can't understand each other or expect users to. And it's not grammar (i.e., it not about the structure of language and sentences) it's just about consistent and logical definitions of specific terms.

But OK, since you don't seem to have strong feelings on the matter, except that you think it's grammar and I don't, how about we go with my proposed definitions which are both intuitive and logically consistent:

a task is complete if it's completion condition is satisfied
a task is incomplete (i.e. not complete) if it's completion is not satisfied
(and therefore a task can be removed from the active pool if it finished (i.e., it has a final status) AND it is complete)

hjoliver commented 6 months ago

My suggestion also handles @dpmatthews point - the event that we need to signal is that a task finished without completing. "Incomplete" is not a good name for that, because the "finished" bit is critical as well. "wont_complete" is a bit ungainly, but it makes sense because a finished task can no longer complete automatically. Maybe we can think of a better name for that. Maybe just finished_incomplete?

oliver-sanders commented 6 months ago

a task is incomplete (i.e. not complete) if it's completion is not satisfied

I don't think your suggestion works? E.G. if a task has no required outputs, then it will be completed before it has even been spawned. A task is not complete if its completion expression is satisfied unless it has a final task status.

This is what I'm proposing:

Outputs are complete if the completion expression is satisfied.
Tasks are complete if they have a final task status AND their outputs are complete.

This is not inconsistent, it is the required logic. Feel free to suggest different names if you feel strongly.

Note these definitions control internal logic and aren't user facing, we report this to the users as "task finished with incomplete outputs".

hjoliver commented 6 months ago

I don't think your suggestion works? E.G. if a task has no required outputs, then it will be completed before it has even been spawned.

That works fine, if (as I'm proposing) "completion" refers only to completion of outputs. If such a task gets triggered, it is not required to generate any outputs at all, it just needs to finish.

This is what I'm proposing:

Outputs are complete if the completion expression is satisfied.

Tasks are complete if they have a final task status AND their outputs are complete.

That's not quite what you are proposing. The proposal refers to incomplete tasks as well:

A task is incomplete if the completion condition is satisfied and the task has a final status.

Something is either complete or incomplete, end of story, by the universally agreed meaning of those terms. But your use of these terms does not conform to that. As proposed, a task that is "not complete" is not necessarily "incomplete". That is not consistent (aka inconsistent!) with the normal definition of those words, and it will cause confusion.

This is not inconsistent, it is the required logic. ...

The required code logic is 100% the same either way, but language used in the proposal to describe it is inconsistent.

Note these definitions control internal logic and aren't user facing, we report this to the users as "task finished with incomplete outputs".

Our proposals do necessarily also invent and use terminology to describe what's going on, and which then has to be documented in the user guide. So this is both developer- and user-facing. To fix it in this case, there are two choices:

a task can be described as "complete" or "incomplete" based on only output completion
- task pool removal logic depends on both "task completion" and "task finished"
only outputs, not tasks, can be described as "complete" or "incomplete"
- task pool removal logic depends on both "task output completion" and "task finished"
- (this avoids having to explain to users that a "complete task" is not necessarily "finished")

I quite like 1. as a convenient way of describing a task whose outputs are complete. But maybe we should go with 2. It's a little more verbose but it has exactly zero chance of confusing anyone because "completion" is not an overloaded term anymore. What do you think?

hjoliver commented 6 months ago

Feel free to suggest different names if you feel strongly.

Ok, I think 2. above is my suggestion. Reasons:

we can't document this new logic by describing complete/incomplete tasks as defined in the proposal (it's inconsistent)
but I accept your point that users might be confused that a task can be "complete" without finishing

So let's just not say that tasks can be complete or incomplete. There's no need to. Outputs can be complete or incomplete, and a task can be finished or not finished.

cylc / cylc-admin