Closed zhangw closed 11 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Don't use days_ago
. It's a bad practice from Airlow 1.10 which we since fixed in all (I believe) our examples and documentations. days_ago
effectively calculates new start_date for the DAG every time the dag is parsed, which mean that yes - the dag is different every time.
Generally when you create a DAG you should decide WHEN it's life should start (fixed date) rather than continue moving the start date over and over again - which is effectively what days_ago does.
Don't use
days_ago
. It's a bad practice from Airlow 1.10 which we since fixed in all (I believe) our examples and documentations.days_ago
effectively calculates new start_date for the DAG every time the dag is parsed, which mean that yes - the dag is different every time.Generally when you create a DAG you should decide WHEN it's life should start (fixed date) rather than continue moving the start date over and over again - which is effectively what days_ago does.
For my case, the value of start_date calculated by days_ago
not changed, the changed thing is the value of op_args. Please advise more.
What you see is likely effect of serialization how it was implemented 2.5 years ago - with thousands of bug fixes released since - including tens of them released in the 2.1 line you are using (2.1.3. and 2.1.4).
hash calculation for Airflow only uses those fields:
_comps = {
"task_id",
"dag_id",
"owner",
"email",
"email_on_retry",
"retry_delay",
"retry_exponential_backoff",
"max_retry_delay",
"start_date",
"end_date",
"depends_on_past",
"wait_for_downstream",
"priority_weight",
"sla",
"execution_timeout",
"on_execute_callback",
"on_failure_callback",
"on_success_callback",
"on_retry_callback",
"do_xcom_push",
}
But since 2.1.2 there were probably 10s of changes in this area.
Please advise more.
Upgrade to latest version of Airflow. This is the fastest way. I you want to see if there are any bugfixes related to serialization, to be sure that it is worth it - you can go through release notes https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html and check the few thousands of fixes since, but I strongly advice you to just upgrade - even if there was a bug in the hash implementation back then, the only way to fix it is to upgrade anyway, so you can safe a lot of time on looking by just upgrading.
If you see similar problems after upgrading. please report it here.
What you see is likely effect of serialization how it was implemented 2.5 years ago - with thousands of bug fixes released since - including tens of them released in the 2.1 line you are using (2.1.3. and 2.1.4).
hash calculation for Airflow only uses those fields:
_comps = { "task_id", "dag_id", "owner", "email", "email_on_retry", "retry_delay", "retry_exponential_backoff", "max_retry_delay", "start_date", "end_date", "depends_on_past", "wait_for_downstream", "priority_weight", "sla", "execution_timeout", "on_execute_callback", "on_failure_callback", "on_success_callback", "on_retry_callback", "do_xcom_push", }
But since 2.1.2 there were probably 10s of changes in this area.
Please advise more.
Upgrade to latest version of Airflow. This is the fastest way. I you want to see if there are any bugfixes related to serialization, to be sure that it is worth it - you can go through release notes https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html and check the few thousands of fixes since, but I strongly advice you to just upgrade - even if there was a bug in the hash implementation back then, the only way to fix it is to upgrade anyway, so you can safe a lot of time on looking by just upgrading.
If you see similar problems after upgrading. please report it here.
Thanks for your suggestions!
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
Airflow version 2.1.2
Dag file is simple but using the XComArgs feature, and I notice the dag_hash changed when parsing and serializing every time. I thought the hashing should be stable in this case.
The Dag file for testing
one of the serialized data
another serialized data
the only difference between them
the value of the op_args
What you think should happen instead
No response
How to reproduce
and compare the row results for these executions.
Operating System
My MacPro 14.1.1 (23B81) M1 chipset
Versions of Apache Airflow Providers
No response
Deployment
Virtualenv installation
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct