Detect runs depending on unscheduled runs.

lusewell commented 6 months ago

Feature request:

If a jobrun has a condition of completion of another run that doesn't exist, then we'd like to be able to trigger a callback (& move job to some error state?). Timeouts currently provide some relief from this but often you'd want this to fail much quicker in this case rather than wait until the timeout is hit as we could detect in advance that this won't be able to run.

Examples of when this can happen:

A depends on B, both daily scheduled by B misconfigured with a more restricted calendar.
Run of A conditional on completion of B which has later scheduled start time. At a time between these times, B then gets eg a name change and the dependency gets relinked, sending the run of A to be condtional on a job which will never run (former location of B).

This behaviour wouldn't make sense if you were mixing condition-based dependencies with post-completion scheduling, so I guess you'd want to be able to make it opt-outable if you were co-mingling both of these features.

alexhsamuel commented 6 months ago

we could detect in advance that this won't be able to run

Apsis doesn't know for sure that the dependency cannot be satisfied; the dependency run could be scheduled later: manually, or by a postcondition, or by someone updating the schedule and reloading jobs.

One fairly easy way to allow you to configure the behavior you want is by allowing a waiting timeout per condition.

Suppose you have:

condition:
  type: dependency
  job_id: previous job

With a custom timeout, you could then add a preceding dependency that previous job must exist in a promising state, with a timeout of zero:

condition:
# Error immediately if `previous job` isn't scheduled.
- type: dependency
  job_id: previous job
  states: [scheduled, waiting, starting, running, success]
  timeout: 0
# Wait until `previous job` completes successfully.
- type: dependency
  job_id: previous job

As soon as a run of this job entered the waiting state, it would fail if there were no runs of previous job in a state that might lead to success in the future.

I don't think this should be the default; see the reasons above that the dependency might be scheduled later. Apsis by design does not take a global view of the dependency graph. If you want to enable it in lots of places, we could add some syntax to make this easier.

This behaviour wouldn't make sense if you were mixing condition-based dependencies with post-completion scheduling, so I guess you'd want to be able to make it opt-outable if you were co-mingling both of these features.

Do you mean if previous job was scheduled by a action? Do you actually use this feature?

axwest commented 6 months ago

condition:
# Error immediately if `previous job` isn't scheduled.
- type: dependency
  job_id: previous job
  states: [scheduled, waiting, starting, running, success]
  timeout: 0

If, at the time the run is scheduled, it checks this dependency and determines that it's true, then would Apsis ever check this dependency again? I.e. if previous_job goes away after the first time the dependency is checked, will it fail?

alexhsamuel commented 6 months ago

If, at the time the run is scheduled, it checks this dependency and determines that it's true, then would Apsis ever check this dependency again? I.e. if previous_job goes away after the first time the dependency is checked, will it fail?

No, unless you were to restart Apsis. See this comment.

alexhsamuel commented 6 months ago

Please see #321. I rolled this behavior into the dependency condition in a way that I think is fairly elegant (and more reliable). It's opt-in.

I.e. if previous_job goes away after the first time the dependency is checked, will it fail?

With the new exist: true option, yes, the dependent fails if the dependency goes away or fails/errors.

alexhsamuel / apsis

Detect runs depending on unscheduled runs. #316