Closed match-gabeflores closed 3 years ago
I've also tried creating the run with state SCHEDULED, but it's never run by the scheduler. (perhaps I'm doing it the wrong way - I'm still investigating that)
Right now the only states for Dag Runs are None, "running", or "failed" -- that's why the scheduler is never picking up that dag run.
It looks like the column in the DB actually is just a string and the model uses the generic State
model which is also used for tasks instances (https://github.com/apache/airflow/blob/main/airflow/models/dagrun.py#L82), so I guess it's more about updating the logic used to determine the dag run state (https://github.com/apache/airflow/blob/main/airflow/models/dagrun.py#L473). Obviously, there might be other parts of the code that assume only these 3 states as valid, but it doesn't seem explicit in the model.
verified by QA
I've been testing version 2.1.3 for a few days now, and while adding Queued state seem to help most cases I think this bug is not fully solved.
When manually triggering a dag that has max_active_runs=1 many times, it does happen that dags reaches more than dag run in "running" state at the same time.
In our case the dag has two tasks, a trigger and a sensor:
I will try to gather more specific information and update.
Observing the same behavior on 2.1.2 with catchup=False, this has been blowing through my quotas.
@aran3, I think your case has been fixed in https://github.com/apache/airflow/pull/17786 where tasks can start running while the dagruns are still queued. This would lead to the queued dagrun entering the running state
Issue does not happen on 2.1.4
Is there a way to prevent the scheduler to queue
new runs if there's an active run? I have a DAG that now and then overruns and because of this behaviour I'm seeing DAG runs piling up on the new UI through time due to this new behaviour
@argemiront, If you are on 2.1.4 you can change this:
AIRFLOW__SCHEDULER__MAX_QUEUED_RUNS_PER_DAG=16
to a lower number
AIRFLOW__SCHEDULER__MAX_QUEUED_RUNS_PER_DAG=16
thank you so much!
Issue does not happen on 2.1.4
I tested this one with Docker Airflow 2.1.4 Python 3.7. Only works when triggered via HTTP (UI/API)
However, when using TriggerDagRun Operator, it doesn't work, or is this is the intended behavior for TriggerDagRun?
What you see there is queued run. The currently active run is 1 but there's also a queued run which doesn't count as an active run.
It is worth noting for anyone (such as us) that heavily relies on max_active_runs=1
- that this still happens in 2.1.4 when using cli dags trigger
command or TriggerDagRunOperator
and was supposedly fixed in https://github.com/apache/airflow/issues/18583 (version 2.2.0)
It is worth noting for anyone (such as us) that heavily relies on
max_active_runs=1
- that this still happens in 2.1.4 when using clidags trigger
command orTriggerDagRunOperator
and was supposedly fixed in #18583 (version 2.2.0)
This is now fixed
How can this be solved on a fully managed setup such as MWAA? MWAA only supports 1.10.12 and 2.0.2? I am looking for a workaround here, any help will be appreciated.
@DanielMorales9 Not easily I'm afraid - by asking AWS to provide a more recent version, or use a different method than MWAA that providers quicker update cycles.
Same problem in 2.1.3 with manually triggered dags. All of them run simultaneously. I will try 2.2.0
@stroykova Please let us know. (I'd try 2.2.2 than 2.2.0)
2.2.2 is fine with this :partying_face:
I'm seeing this issue in 2.2.3. catchup=True and max_active_runs=1. DAG is triggered multiple times and multiple instances are running in parallel.
If you are on an older version of airflow that has this problem you can add concurrency setting to you dag (ie concurrency = some_num ). That or depends_on_past = True
@alexstrimbeanu Coming in here with an attitude like that is unacceptable and is not going to help your cause but I'll give you the decency of replying.
You may notice that this particular issue is closed and, afaik, there isn't currently an open issue that documents this as a problem.
Maybe you would like to open one and provide it will all the necessary information so that someone can replicate the scenario?
Edit: There is a separate issue affecting max_active_runs in 1.10.14. That regression is fixed in 1.10.15.
Edit2: Version v2.1.3 contains some fixes but also contains bad regressions involving max_active_runs. Use v2.14 for the complete fixes to this issue
Edit3: Version 2.2.0 contains a fix for max_active_runs using
dags trigger
command orTriggerDagRunOperator
. https://github.com/apache/airflow/issues/18583--
Apache Airflow version: 1.10.11, localExecutor
What happened:
I have max_active_runs = 1 in my dag file (which consists of multiple tasks) and I manually triggered a dag. While it was running, a second execution began under its scheduled time while the first execution was running.
I should note that the second execution is initially queued. It's only when the dag's 1st execution moves to the next task that the second execution actually starts.
My dag definition. The dag just contains tasks using pythonOperator.
What you expected to happen:
Only one execution should run. A second execution should be queued but not begin executing.
How to reproduce it: In my scenario:
Anything else we need to know: I think the second execution begins in between the task1 and task2 of execution1. I think there's a few second delay there and maybe that's when Airflow thinks there's no dag execution? That's just a guess.
Btw, this can have potentially disastrous effects (errors, incomplete data without errors, etc)