apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.66k stars 13.89k forks source link

Scheduler is skipping a day sometimes #15036

Closed iameugenejo closed 3 years ago

iameugenejo commented 3 years ago

I'm having an exactly same issue as this user - https://www.reddit.com/r/dataengineering/comments/lri9fv/airflow_dag_is_skipping_a_day/

Apache Airflow version: 2.0.1

Environment:

Screen Shot 2021-03-26 at 11 01 32 AM Screen Shot 2021-03-26 at 11 01 40 AM Screen Shot 2021-03-26 at 11 04 10 AM Screen Shot 2021-03-26 at 11 01 54 AM Screen Shot 2021-03-26 at 11 01 59 AM Screen Shot 2021-03-26 at 11 03 43 AM

The scheduler log is there for the missing date without showing any errors.

And there was no manual runs at all for this dag.

boring-cyborg[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

eladkal commented 3 years ago

@iameugenejo can you share more details about the issue? how often does it happen? effecting specific dag or all dags in the system?

Without reproduce steps / more information it might be hard to understand the root cause

iameugenejo commented 3 years ago

@eladkal , it happened 5 times so far since 2/20.

It's happening to 1 specific dag.

The dag itself is static but the tasks the dag executes are generated dynamically.

The other dags that are not showing this symptom have their tasks statically coded.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

eladkal commented 3 years ago

@iameugenejo can you share the DAG code? we need more information here. If we can't reproduce it's almost impossible to find a fix.

iameugenejo commented 3 years ago

this hasn't happened for the past month or so.

The following is the dag with some values redacted.

from airflow.models import DAG
from airflow.operators.bash import BashOperator
import sys
import pendulum
from airflow.utils import timezone

DAG_ID = '{REDACTED}'

now = pendulum.now(timezone.utc)

schedule_interval = '0 16 * * *'
start_date = now - timedelta(days=1)

max_active_runs = 1
num_of_tasks = 6
email_ids = '{REDACTED}'

with DAG(
        dag_id=DAG_ID,
        start_date=start_date,
        max_active_runs=max_active_runs,
        default_args={
            'owner': 'airflow',
            'start_date': start_date,
            'max_active_runs': max_active_runs,
            'email': email_ids,
            'email_on_failure': True,
            'email_on_retry': True
        },
        schedule_interval=schedule_interval,
        dagrun_timeout=timedelta(seconds=43200),  # 6 hours
        catchup=False
) as dag:
    tasks = []
    for i in range(0, num_of_tasks):
        tasks.append(BashOperator(
            task_id='redacted_'+str(i+1),
            retries=10,
            retry_delay=timedelta(seconds=60),  # 1 minute retry delay
            retry_exponential_backoff=True,
            max_retry_delay=timedelta(seconds=900),  # 15 minutes max retry delay
            do_xcom_push=True,  # return the last line from the stdout
            bash_command="REDACTED.sh {} {} ".format(int(i), int(num_of_tasks)),
            dag=dag))
        if i != 0:
            tasks[i-1] >> tasks[i]
jhtimmins commented 3 years ago

@eladkal were you able to validate this? Just trying to get an idea what the status is

eladkal commented 3 years ago

I wasn't able to reproduce but I think it's related to the dynamic start_date used in the DAG which is a bad practice and can lead to all kind of undesired behavior. start_date = now - timedelta(days=1)

I tend to close this issue

jhtimmins commented 3 years ago

Thanks @eladkal.

@iameugenejo are you able to replicate this bug even if you remove the dynamic start_date? If not, I agree with @eladkal that we can probably chalk it up to the dynamic start date.

@kaxil Is it possible/desirable to add a check for dynamic start dates and to throw an error or warning?

iameugenejo commented 3 years ago

Dynamic start_date is still there and the issue hasn't happened for the past few months, so it might not be about the dynamic start_date.

But since I'm not seeing the issue anymore, I don't mind closing this issue and reopening it when it occurs again

jhtimmins commented 3 years ago

@iameugenejo Sounds good. I'll close for now