LSSTDESC / gen3_workflow

Development code for a Gen3-based DRP pipeline implementation
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

plugin does not wait for all parsl tasks to complete when there is a failure #40

Closed benclifford closed 2 years ago

benclifford commented 2 years ago

The code to wait for all parsl tasks to complete is:

https://github.com/LSSTDESC/gen3_workflow/blob/940f9f993421826ae884e293a57624e10e48b23b/python/desc/gen3_workflow/parsl_service.py#L661

            _ = [future.result() for future in futures]

This results in waiting for a while, but not until all tasks are complete: if any of the futures contains an exception rather than a successful result, then .result() will raise that exception, and the entire line will stop without continuing with remaining futures.

The idom I usually use instead is:

            _ = [future.exception() for future in futures]

which has the same wait-for-completion behaviour as .result() but does not raise exceptions. (The return values are any exceptions, or None if a task was successful - this line is discarding the return values so what actually comes out is irrelevant).

@surhudm has tested this in his environment and it works better for him.

However, this changes the circumstances in which the finalization code in subsequent lines runs.#39

Issue #36 says that the finalization code should run even when there is a failure. Making the above change would do that.

However, in casual chat with @jchiang87 it sounded like he actually makes use of the don't-finalize-on-failure behaviour sometimes - so I've opened this as an issue rather than a pull request, to solicit what the true requirements are.

surhudm commented 2 years ago

@benclifford @jchiang87 there is a config option which prevents the ExecutionButler from not merging on failure.

executionButler:
    whenMerge: "SUCCESS"

So I guess there is a yaml way of dealing with what @jchiang87 needs, rather than let the workflow crash.

jchiang87 commented 2 years ago

closed by #41