apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.44k stars 14.11k forks source link

Airflow DAG is unable to create Task and skipping all remaining tasks with '0' downstream #32552

Closed da-dhavalkalola closed 1 year ago

da-dhavalkalola commented 1 year ago

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

We are using Airlfow 2.4.3 version and for Scheduled DAG we found that it is unable to generate task and skipping all remaining tasks with failed status.

==> Below are the part of the DAG code and it is failing after the 'build_activate' task by showing '0' downstream.

`( start_operator

resume_cluster fetch_buildinfo parallel_task1 delete_usa_records data_load5 data_load6 data_load7 parallel_task2 data_load14 data_load15 data_load16 data_load16_1 data_load17 data_load17_1 get_count build_activate trigger_Fill_rate_QC_report send_notification generate_op ) generate_op >> trigger_business_US_dag generate_op >> end_operator`

=> trigger_Fill_rate_QC_report and trigger_business_US_dag :- These two tasks are triggering the different DAG run. => generate_op is the branchPythonOperator

==> DAG Tasks Status Image

image

=> DAG Workflow Image

image

What you think should happen instead

It should run all mentioned DAG Tasks and completed successfully.

How to reproduce

DAG is stopping at the same task every time and skipping all other remaining tasks after that and showing failed status.

Operating System

Linux

Versions of Apache Airflow Providers

Airflow 2.4.3

==> Below are the pip list for all other package version

Package Version


acme 1.1.0 aenum 3.1.11 aiohttp 3.8.3 aiosignal 1.3.1 alembic 1.9.2 anyio 3.6.2 apache-airflow 2.5.1 apache-airflow-providers-amazon 6.0.0 apache-airflow-providers-cncf-kubernetes 4.4.0 apache-airflow-providers-common-sql 1.3.3 apache-airflow-providers-databricks 3.3.0 apache-airflow-providers-ftp 3.3.0 apache-airflow-providers-http 4.1.1 apache-airflow-providers-imap 3.1.1 apache-airflow-providers-jdbc 3.2.1 apache-airflow-providers-microsoft-mssql 3.2.1 apache-airflow-providers-postgres 5.2.2 apache-airflow-providers-sftp 4.1.0 apache-airflow-providers-sqlite 3.3.1 apache-airflow-providers-ssh 3.4.0 apispec 3.3.2 argcomplete 2.0.0 asn1crypto 1.5.1 async-timeout 4.0.2 attrs 22.2.0 Automat 0.8.0 awswrangler 2.17.0 Babel 2.11.0 backoff 2.2.1 bcrypt 4.0.1 beautifulsoup4 4.11.1 billiard 3.6.4.0 blinker 1.4 boto3 1.26.54 botocore 1.29.54 cachelib 0.10.1 cachetools 5.3.0 cattrs 22.2.0 certbot 0.40.0 certifi 2019.11.28 chardet 3.0.4 charset-normalizer 3.0.1 click 8.1.3 clickclick 20.10.2 cloud-init 23.1.2 colorama 0.4.3 colorlog 4.8.0 command-not-found 0.3 ConfigArgParse 0.13.0 configobj 5.0.6 ConfigUpdater 3.1.1 connexion 2.14.1 constantly 15.1.0 cron-descriptor 1.2.32 croniter 1.3.8 cryptography 2.8 databricks-sql-connector 2.3.0 dbus-python 1.2.16 decorator 5.1.1 Deprecated 1.2.13 dill 0.3.6 distro 1.4.0 distro-info 0.23ubuntu1 dnspython 2.3.0 docutils 0.19 ec2-hibinit-agent 1.0.0 email-validator 1.3.1 entrypoints 0.3 et-xmlfile 1.1.0 exceptiongroup 1.1.0 Flask 2.2.2 Flask-AppBuilder 4.1.4 Flask-Babel 2.0.0 Flask-Caching 2.0.2 Flask-JWT-Extended 4.4.4 Flask-Login 0.6.2 Flask-Session 0.4.0 Flask-SQLAlchemy 2.5.1 Flask-WTF 1.1.1 frozenlist 1.3.3 future 0.18.2 google-auth 2.16.0 gpg 1.13.1-unknown graphviz 0.20.1 greenlet 2.0.1 gremlinpython 3.6.1 gunicorn 20.1.0 h11 0.14.0 httpcore 0.16.3 httplib2 0.14.0 httpx 0.23.3 hyperlink 19.0.0 idna 2.8 importlib-metadata 6.0.0 importlib-resources 5.10.2 incremental 16.10.1 inflection 0.5.1 isodate 0.6.1 itsdangerous 2.1.2 JayDeBeApi 1.2.3 Jinja2 3.1.2 jmespath 1.0.1 josepy 1.2.0 JPype1 1.4.1 jsonpatch 1.22 jsonpath-ng 1.5.3 jsonpointer 2.0 jsonschema 3.2.0 keyring 18.0.1 kubernetes 23.6.0 language-selector 0.1 launchpadlib 1.10.13 lazr.restfulclient 0.14.2 lazr.uri 1.0.3 lazy-object-proxy 1.9.0 linkify-it-py 2.0.0 lockfile 0.12.2 lxml 4.9.2 lz4 4.3.2 Mako 1.2.4 Markdown 3.4.1 markdown-it-py 2.1.0 MarkupSafe 2.1.2 marshmallow 3.19.0 marshmallow-enum 1.5.1 marshmallow-oneofschema 3.0.1 marshmallow-sqlalchemy 0.26.1 mdit-py-plugins 0.3.3 mdurl 0.1.2 mock 3.0.5 more-itertools 4.2.0 multidict 6.0.4 mypy-boto3-appflow 1.26.53 mypy-boto3-rds 1.26.47 mypy-boto3-redshift-data 1.26.30 nest-asyncio 1.5.6 netifaces 0.10.4 numpy 1.24.1 oauthlib 3.1.0 openpyxl 3.0.10 opensearch-py 2.0.1 packaging 23.0 pandas 1.5.3 paramiko 2.7.2 parsedatetime 2.4 pathspec 0.9.0 pbr 5.4.5 pendulum 2.1.2 pexpect 4.6.0 pg8000 1.29.4 pip 20.0.2 pluggy 1.0.0 ply 3.11 prison 0.2.1 progressbar2 4.2.0 psutil 5.9.4 psycopg2 2.9.5 pyarrow 8.0.0 pyasn1 0.4.2 pyasn1-modules 0.2.1 pycrypto 2.6.1 Pygments 2.14.0 PyGObject 3.36.0 PyHamcrest 1.9.0 PyICU 2.4.2 PyJWT 2.6.0 pymacaroons 0.13.0 pymssql 2.2.7 PyMySQL 1.0.2 PyNaCl 1.3.0 pyOpenSSL 19.0.0 pyRFC3339 1.1 pyrsistent 0.15.5 pyserial 3.4 python-apt 2.0.1+ubuntu0.20.4.1 python-daemon 2.3.2 python-dateutil 2.8.2 python-debian 0.1.36ubuntu1 python-nvd3 0.15.0 python-slugify 7.0.0 python-utils 3.4.5 pytz 2022.7.1 pytzdata 2020.1 PyYAML 6.0 redshift-connector 2.0.909 requests 2.28.2 requests-aws4auth 1.2.0 requests-oauthlib 1.3.1 requests-toolbelt 0.10.1 requests-unixsocket 0.2.0 rfc3986 1.5.0 rich 13.2.0 rsa 4.9 s3transfer 0.6.0 scramp 1.4.4 SecretStorage 2.3.1 service-identity 18.1.0 setproctitle 1.3.2 setuptools 45.2.0 simplejson 3.16.0 six 1.14.0 smart-open 6.2.0 sniffio 1.3.0 sos 4.4 soupsieve 2.3.2.post1 SQLAlchemy 1.4.46 SQLAlchemy-JSONField 1.0.1.post0 sqlalchemy-redshift 0.8.12 SQLAlchemy-Utils 0.39.0 sqlparse 0.4.3 ssh-import-id 5.10 sshtunnel 0.4.0 systemd-python 234 tabulate 0.9.0 tenacity 8.1.0 termcolor 2.2.0 text-unidecode 1.3 thrift 0.16.0 Twisted 18.9.0 typing-extensions 4.4.0 ubuntu-advantage-tools 27.12 uc-micro-py 1.0.1 ufw 0.36 unattended-upgrades 0.1 unicodecsv 0.14.1 urllib3 1.25.8 wadllib 1.3.3 watchtower 2.0.1 websocket-client 1.4.2 Werkzeug 2.2.2 wheel 0.34.2 wrapt 1.14.1 WTForms 3.0.1 XlsxWriter 3.0.3 yarl 1.8.2 zipp 1.0.0 zope.component 4.3.0 zope.event 4.4 zope.hookable 5.0.0 zope.interface 4.7.1

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

hussein-awala commented 1 year ago

How do you create these tasks? Which operators are you using? Is trigger_Fill_rate_QC_report a new task?

You need to provide an example to reproduce the issue or a result of an investigation you did, otherwise we cannot understand what is the issue and fix it.

da-ekta-sharma commented 1 year ago

We created trigger_Fill_rate_QC_report task using triggerDagRunOperator and it previous(build_activate) and downstream task (send_notification) is PythonOperator. Then generate_op is BranchPythonOperator which selects in between end_operator and trigger_business_US_dag task.

DAG's graph view is generated correctly and we have refreshed the dag in environment this issue is still persist.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

potiuk commented 1 year ago

We created trigger_Fill_rate_QC_report task using triggerDagRunOperator and it previous(build_activate) and downstream task (send_notification) is PythonOperator. Then generate_op is BranchPythonOperator which selects in between end_operator and trigger_business_US_dag task.

DAG's graph view is generated correctly and we have refreshed the dag in environment this issue is still persist.

Can you please share the whole code of your dag creation (not just excerpt but the structure, including types of dependencies) Also, share what you mean by " '0' downstream."? Also logs from scheduler when it processes the tasks dependencies, around the time when the last task finishes and you expect it to trigger the task that is not triggered. That would help to help you with the problem, almost for sure it's some kind of typo in some of your code, but it's hard to guess it without seeing all those details.

I will also convert it to a discussion. It looks like a troubleshooting problem, not airflow issue and unless we can get some the more details I asked and can diagnose it better, we will keep it as discussion.