apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.31k stars 14.09k forks source link

max_active_runs = 1 can still create multiple active execution runs #9975

Closed match-gabeflores closed 3 years ago

match-gabeflores commented 4 years ago

Edit: There is a separate issue affecting max_active_runs in 1.10.14. That regression is fixed in 1.10.15.

Edit2: Version v2.1.3 contains some fixes but also contains bad regressions involving max_active_runs. Use v2.14 for the complete fixes to this issue

Edit3: Version 2.2.0 contains a fix for max_active_runs using dags triggercommand or TriggerDagRunOperator. https://github.com/apache/airflow/issues/18583

--

Apache Airflow version: 1.10.11, localExecutor

What happened:

I have max_active_runs = 1 in my dag file (which consists of multiple tasks) and I manually triggered a dag. While it was running, a second execution began under its scheduled time while the first execution was running.

I should note that the second execution is initially queued. It's only when the dag's 1st execution moves to the next task that the second execution actually starts.

My dag definition. The dag just contains tasks using pythonOperator.

dag = DAG(
    'dag1',
    default_args=default_args,
    description='xyz',
    schedule_interval=timedelta(hours=1),
    catchup=False,
    max_active_runs=1
)

What you expected to happen:

Only one execution should run. A second execution should be queued but not begin executing.

How to reproduce it: In my scenario:

  1. Manually trigger dag with multiple tasks.. have task1 take longer than the beginning of the next scheduled execution. (Dag Execution1). As an example, if the scheduled interval is 1 hour, have task1 take longer than 1 hour so as to queue up the second execution (Execution2).
  2. When task1 of Execution1 finishes and just before starting task2, the second execution (Execution2, which is already queued) begins running.

image

Anything else we need to know: I think the second execution begins in between the task1 and task2 of execution1. I think there's a few second delay there and maybe that's when Airflow thinks there's no dag execution? That's just a guess.

Btw, this can have potentially disastrous effects (errors, incomplete data without errors, etc)

fj-sanchez commented 3 years ago

I've also tried creating the run with state SCHEDULED, but it's never run by the scheduler. (perhaps I'm doing it the wrong way - I'm still investigating that)

Right now the only states for Dag Runs are None, "running", or "failed" -- that's why the scheduler is never picking up that dag run.

It looks like the column in the DB actually is just a string and the model uses the generic State model which is also used for tasks instances (https://github.com/apache/airflow/blob/main/airflow/models/dagrun.py#L82), so I guess it's more about updating the logic used to determine the dag run state (https://github.com/apache/airflow/blob/main/airflow/models/dagrun.py#L473). Obviously, there might be other parts of the code that assume only these 3 states as valid, but it doesn't seem explicit in the model.

himabindu07 commented 3 years ago

verified by QA

Screen Shot 2021-08-18 at 3 03 51 PM
aran3 commented 3 years ago

I've been testing version 2.1.3 for a few days now, and while adding Queued state seem to help most cases I think this bug is not fully solved. When manually triggering a dag that has max_active_runs=1 many times, it does happen that dags reaches more than dag run in "running" state at the same time.
In our case the dag has two tasks, a trigger and a sensor: image

I will try to gather more specific information and update.

kamushadenes commented 3 years ago

Observing the same behavior on 2.1.2 with catchup=False, this has been blowing through my quotas.

ephraimbuddy commented 3 years ago

@aran3, I think your case has been fixed in https://github.com/apache/airflow/pull/17786 where tasks can start running while the dagruns are still queued. This would lead to the queued dagrun entering the running state

vumdao commented 2 years ago

Issue does not happen on 2.1.4 image

argemiront commented 2 years ago

Is there a way to prevent the scheduler to queue new runs if there's an active run? I have a DAG that now and then overruns and because of this behaviour I'm seeing DAG runs piling up on the new UI through time due to this new behaviour

ephraimbuddy commented 2 years ago

@argemiront, If you are on 2.1.4 you can change this: AIRFLOW__SCHEDULER__MAX_QUEUED_RUNS_PER_DAG=16 to a lower number

argemiront commented 2 years ago

AIRFLOW__SCHEDULER__MAX_QUEUED_RUNS_PER_DAG=16

thank you so much!

dave-martinez commented 2 years ago

Issue does not happen on 2.1.4 image

I tested this one with Docker Airflow 2.1.4 Python 3.7. Only works when triggered via HTTP (UI/API)

However, when using TriggerDagRun Operator, it doesn't work, or is this is the intended behavior for TriggerDagRun?

ephraimbuddy commented 2 years ago

What you see there is queued run. The currently active run is 1 but there's also a queued run which doesn't count as an active run.

aran3 commented 2 years ago

It is worth noting for anyone (such as us) that heavily relies on max_active_runs=1 - that this still happens in 2.1.4 when using cli dags trigger command or TriggerDagRunOperator and was supposedly fixed in https://github.com/apache/airflow/issues/18583 (version 2.2.0)

image

ephraimbuddy commented 2 years ago

It is worth noting for anyone (such as us) that heavily relies on max_active_runs=1 - that this still happens in 2.1.4 when using cli dags trigger command or TriggerDagRunOperator and was supposedly fixed in #18583 (version 2.2.0)

image

This is now fixed

DanielMorales9 commented 2 years ago

How can this be solved on a fully managed setup such as MWAA? MWAA only supports 1.10.12 and 2.0.2? I am looking for a workaround here, any help will be appreciated.

ashb commented 2 years ago

@DanielMorales9 Not easily I'm afraid - by asking AWS to provide a more recent version, or use a different method than MWAA that providers quicker update cycles.

stroykova commented 2 years ago

Same problem in 2.1.3 with manually triggered dags. All of them run simultaneously. I will try 2.2.0

ashb commented 2 years ago

@stroykova Please let us know. (I'd try 2.2.2 than 2.2.0)

stroykova commented 2 years ago

2.2.2 is fine with this :partying_face:

stas-snow commented 2 years ago

I'm seeing this issue in 2.2.3. catchup=True and max_active_runs=1. DAG is triggered multiple times and multiple instances are running in parallel.

image
GabeChurch commented 2 years ago

If you are on an older version of airflow that has this problem you can add concurrency setting to you dag (ie concurrency = some_num ). That or depends_on_past = True

nathadfield commented 9 months ago

@alexstrimbeanu Coming in here with an attitude like that is unacceptable and is not going to help your cause but I'll give you the decency of replying.

You may notice that this particular issue is closed and, afaik, there isn't currently an open issue that documents this as a problem.

Maybe you would like to open one and provide it will all the necessary information so that someone can replicate the scenario?