Azure / hpcpack

The repo to track public issues for Microsoft HPC Pack product.
MIT License
29 stars 11 forks source link

Allow job template to say jobs should fail if their parent(s) fail. #28

Open weshinsley opened 1 year ago

weshinsley commented 1 year ago

(Relates to HPC Pack 2019, 6.1.7531.0 and probably earlier)

Feature Request Description

Describe Preferred Solution

Option to select Fail Dependent Tasks (or jobs?) in HPC Cluster Manager, in Configuration -> Job Templates -> Job Template Editor -> Add (property) drop down. We already have "Fail on Task Failure", but not "Fail if parent tasks/jobs fail"

Describe Alternatives Considered

Alternatively - I cannot really see a reason why you wouldn't want /faildependenttasks to be on all the time. Presumably it makes no difference if there are no dependent jobs, but I think it's reasonable that all child jobs fail by default if the parent fails.

weshinsley commented 1 year ago

To follow up - I think /faildependenttasks does not do what I hoped it would, and it may be some additional functionality I am requesting - perhaps /faildependentjobs - in which a job will fail if one of its /parentjobids also fails.

YutongSun commented 1 year ago

@weshinsley , thanks for the feedback. The original design is to keep the child jobs in active Queue state once any of the parent jobs is canceled or failed. Since the canceled or failed parent job can be requeued, the child jobs will run after the requeued parent job completes successfully. If the canceled or failed parent job was deleted from the database after a long period, the queued child jobs would be set to Failed state. I agree we may provide another option to cancel or fail the child jobs immediately after any of the parent jobs is canceled or failed.