apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.46k stars 14.36k forks source link

Move callbacks to worker #44354

Open kaxil opened 1 week ago

kaxil commented 1 week ago

We will move callbacks to workers. TBD on “how”.

Options:

  1. Moving callbacks to workers as a separate activity.
  2. Running callbacks as teardown task.
  3. Deprecate (not remove) concept of callback and pushing users to use teardown. If we do this and deprecate/not remove this option will happen with one of the above two options.
  4. Leave them in dag-processor

I am leaning towards (2) — and (3) is my next preference. This is because imo callbacks and teardowns are sort of redundant concepts, and if we keep them as is, we will have an explosion of callbacks, we already have: on_{success,failure,skipped,retry,execute}_callback .

potiuk commented 1 week ago

The (2) sounds like a really good idea (not for all callbacks though). It would be great to see callbacks as implicit "tasks" in Airflow UI. Though not all callbacks can be done this way I am afraid. "on_retry" and "on execute" are particularly not fitting the "teardown" concept. The "on_execute" might be an implicit "setup" task, but "on_retry" is a bit problematic

eladkal commented 2 days ago

we need to consider what happens when task succeeded but callback wasn't. We need retry for callbacks but if for example worker is terminated between task success to callback execution how would it retry? Also on_sla_callback() and on_execute_callback() can't be part of teardown. We need to verify with AIP-86 what is the plan for sla callback as part of the deadline interface and if it conflicts with the plan we choose to take here cc @ferruzzi @romsharon98

ferruzzi commented 10 hours ago

The plan for Deadlines is to add a simple check in the scheduler loop, essentially (obviously pseudocode): if (select min(deadline) from DeadlinesTable) < utcnow(): handle_deadline_misses() My intention was for the handler to spin up a new process to run the callbacks so it isn't holding anything else up, but let me know if that plan needs to change with this initiative.