Closed schwartzpub closed 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Looking through the documentation I see that it is recommended to keep the airflow cfg at UTC. Having set this back to UTC and reimporting the DAG which has been made TZ aware pendulum.datetime(2023,1,19,6,tz="America/Chicago")
the DAG is still confused about when it should run. With the UI set to CST (-06:00) it shows the Next Run is 6 hours ago, and is not scheduling them correctly according to the cron syntax provided. It is still running the tasks 6 hours later than expected, for example the 6:15a run is happening at 12:15p CST.
Be aware that the "Last Run" and "Next Run" are for Airflow's logical_date/execution_date which can be a little confusing to interpret at times: https://airflow.apache.org/docs/apache-airflow/stable/faq.html#what-does-execution-date-mean
They are not the expected wall time for the DAG to kick off.
FYI I use a non-UTC timezone for default config, UI, and DAG start_date
, and do not have any issues. I believe the line about using UTC in the config was written pre-Airflow 1.10 when the timezone support was very poor.
Other than that I don't think there are enough details to assist you, you should provide a simple example DAG with the airflow cfg changes and UI changes you have made, and a screenshot of what you think is incorrect.
Last run/next run aside -- what other details are needed that aren't provided above? For reference, the documentation for 2.5 is where the recommendation for default_timezone = UTC
came from.
The dag definition is as follows:
from datetime import datetime, timedelta
from airflow import DAG
from airflow.models import Variable
from airflow.operators.bash import BashOperator
import pendulum
with DAG(
"test_dataflow",
default_args = {
"depends_on_past": False,
"email": ["test@test.com"],
"email_on_failure": False,
},
description="Test Dataflow",
schedule="*/15 6-17 * * 1-5",
start_date=pendulum.datetime(2023,1,19,6,tz='America/Chicago'),
catchup=False,
) as dag:
ssis_p = Variable.get("ssis_password")
bash_comm = "/opt/ssis/bin/dtexec /f /home/airflow/airflow/ssis/Package.dtsx /de {0} /l 'DTS.LogProviderTextFile;ssis.txt'".format(ssis_p)
t1 = BashOperator(
task_id="ssis_dataflow",
bash_command=bash_comm
)
t1
The local server is set to CST
user@host:~$ sudo timedatectl
Local time: Thu 2023-01-19 06:44:37 CST
Universal time: Thu 2023-01-19 12:44:37 UTC
RTC time: Thu 2023-01-19 12:44:36
Time zone: America/Chicago (CST, -0600)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
The airflow.cfg is currently set to UTC but changing to America/Chicago
and restarting the schedulre and webserver services doesn't change the behavior:
default_timezone = utc
I cannot provide a screenshot since the run times in the UI are not a good indicator (if there is a screenshot that can show this, I can certainly provide one), but given the above configurations I would expect the first run each weekday to happen at 6:15a CST. Instead, the first run of the day happens at 12:15pm CST. When I check the DAG in the morning(s) and through to the afternoon there are no new runs until 12:15pm CST.
If there is any other information I can provide that might be missing here, please let me know so I can provide it.
This is a screenshot of all the DAG runs from today so far.
I still don't fully understand the Logical Date, and I still don't understand what would cause this to start the daily DAG runs at 12:15pCST instead of 06:00aCST.
Something else interesting is the queued_at, start_date, end_date for the DAG runs in the database, where they are showing 6p UTC and later, which again doesn't make sense if the expected intervals are UTC-6.
Looks like it's more discussion rather than Issue. Converted
Apache Airflow version
2.5.0
What happened
When using cron syntax for DAG schedule, the scheduler is not running DAGs at the correct time for my timezone. For instance, a DAG that should run at 6:00am is running at 12:00am as though the scheduler believes the system time is set for UTC.
default_timezone
in airflow.cfg doesn't seem to affect runtime.start_date=pendulum.datetime(2023,1,17,tz='Asia/Bishkek')
and then setting the UI to UTC will show the correct time and run the DAG at the correct time for America/Chicagoairflow.cfg
default_timezone = America/Chicago
Ubuntu 20.04LTS
dag.py
When checking the DAG details in the UI, I see this which leads me to believe something is converting my schedule to UTC when the DAG is imported:
next_dagrun_data_interval = DataInterval(start=DateTime(2023, 1, 19, 6, 45, 0, tzinfo=Timezone('UTC')), end=DateTime(2023, 1, 19, 7, 0, 0, tzinfo=Timezone('UTC')))
What you think should happen instead
The DAG should run every 15min between 6a-6p CST in respect to the system, Airflow, and DAG timezone configuration.
How to reproduce
No response
Operating System
Ubuntu 20.04 LTS
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.1.0 apache-airflow-providers-common-sql==1.3.2 apache-airflow-providers-ftp==3.3.0 apache-airflow-providers-http==4.1.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-microsoft-mssql==3.3.2 apache-airflow-providers-sqlite==3.3.1
Deployment
Other
Deployment details
Manual install of apache-airflow using pip.
Anything else
No response
Are you willing to submit PR?
Code of Conduct