Closed chengzi0103 closed 2 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Looks similar to #12136 and #15990.
@uranusjr I noticed that @raphaelauv does not have this problem when using apache-airflow-providers-cncf-kubernetes version greater than 3.0.0 but my version is obviously 3.0.2 I don't know where the problem is
download = KubernetesPodOperator( task_id='X12', is_delete_operator_pod=True, get_logs=True, image=images_name, namespace=name_space, name=f'download_api', cmds=['python cmd'], arguments=[f"--account={config.dags['daily_01_download_process']['account']}"], in_cluster=False, volumes=[get_k8s_pod_mount_volume_of_host(mount_local_path)], volume_mounts=[get_k8s_pod_mount_volume_of_worker(remote_path), ], )
Please downgrade your kubernetes
library version to 11.0.0
(you have kubernetes 22.6.0
) @chengzi0103 . You apparently did not use constraints when you installed airflow and providers. We are just adding a protection to make it harder to upgrade to incompatible version of kubernets library, and we just yanked 3.0.2
cncf.kubernetes provider for people who did it - but using constraints as described in https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html is the best way to have stable airflow installation (that's the only officially supported way of installing Airflow).
hi @chengzi0103 it appears that your task likely failed for reasons unrelated to the traceback shown.
sometimes logs read is interrupted due to connection issue. in that case we catch the error and resume logging. and that's what that traceback is about. but note that it is only a warning, and that the logs later resume, and the task doesn't fail for another 8 minutes.
your issue report inspired us to move that traceback to the DEBUG level in https://github.com/apache/airflow/pull/22595, so as not to cause false alarm or confusion.
@chengzi0103
Yes my setting is
kubernetes==11.0.0
apache-airflow-providers-cncf-kubernetes==3.0.2
@chengzi0103
Yes my setting is
kubernetes==11.0.0 apache-airflow-providers-cncf-kubernetes==3.0.2
Thank you for your answers I will try to use larger clusters and resources later this problem occurs less often I will use your suggestions Thanks again
Apache Airflow version
2.2.3 (latest released)
What happened
airflow log error when running multiple k8s_pods
What you expected to happen
how to fix it ?
How to reproduce
I have three machines Machine A: run airflow webserver and scheduler B and C machines: run celery worker All operators run on the k8s cluster through k8s_config
Once I run multiple tasks, the program will automatically report the error log problem I don't know how to solve it
Operating System
Debian GNU/Linux 11
Versions of Apache Airflow Providers
alembic 1.7.5 amqp 5.0.9 anyio 3.5.0 apache-airflow 2.2.3 apache-airflow-providers-celery 2.1.0 apache-airflow-providers-cncf-kubernetes 3.0.2 apache-airflow-providers-docker 2.4.1 apache-airflow-providers-ftp 2.0.1 apache-airflow-providers-http 2.0.3 apache-airflow-providers-imap 2.2.0 apache-airflow-providers-sqlite 2.1.0 apispec 3.3.2 argcomplete 1.12.3 attrs 20.3.0 Babel 2.9.1 billiard 3.6.4.0 bleach 4.1.0 blinker 1.4 cachetools 5.0.0 cattrs 1.6.0 celery 5.2.2 certifi 2021.10.8 cffi 1.15.0 charset-normalizer 2.0.12 click 8.0.4 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.2.0 clickclick 20.10.2 colorama 0.4.4 colorlog 5.0.1 commonmark 0.9.1 coverage 6.3.2 croniter 1.0.15 cryptography 36.0.1 datacompy 0.7.3 defusedxml 0.7.1 dill 0.3.4 dnspython 2.2.0 docker 5.0.3 docutils 0.16 email-validator 1.1.3 Flask 1.1.2 Flask-AppBuilder 3.4.4 Flask-Babel 2.0.0 Flask-Caching 1.10.1 Flask-JWT-Extended 3.25.1 Flask-Login 0.4.1 Flask-OpenID 1.3.0 Flask-SQLAlchemy 2.5.1 Flask-WTF 0.14.3 flower 1.0.0 gevent 21.12.0 google-auth 2.6.0 graphviz 0.19.1 greenlet 1.1.2 gunicorn 20.1.0 h11 0.12.0 httpcore 0.13.7 httpx 0.19.0 humanize 4.0.0 idna 3.3 importlib-metadata 4.11.1 importlib-resources 5.4.0 inflection 0.5.1 iniconfig 1.1.1 iso8601 1.0.2 itsdangerous 1.1.0 jeepney 0.7.1 Jinja2 3.0.3 jsonschema 3.2.0 keyring 23.5.0 kombu 5.2.3 kubernetes 22.6.0 lazy-object-proxy 1.7.1 lockfile 0.12.2 Mako 1.1.6 Markdown 3.3.6 MarkupSafe 2.1.0 marshmallow 3.14.1 marshmallow-enum 1.5.1 marshmallow-oneofschema 3.0.1 marshmallow-sqlalchemy 0.26.1 numexpr 2.8.1 numpy 1.22.2 oauthlib 3.2.0 openapi-schema-validator 0.2.3 openapi-spec-validator 0.4.0 ordered-set 4.1.0 packaging 21.3 pandas 1.3.5 pendulum 2.1.2 pip 22.0.3 pkginfo 1.8.2 pluggy 1.0.0 prettytable 3.1.1 prison 0.2.1 prometheus-client 0.13.1 prompt-toolkit 3.0.28 psutil 5.9.0 psycopg2-binary 2.9.3 py 1.11.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycparser 2.21 Pygments 2.11.2 PyJWT 1.7.1 pyparsing 3.0.7 pyrsistent 0.18.1 pytest 7.0.1 pytest-cov 3.0.0 python-daemon 2.3.0 python-dateutil 2.8.2 python-nvd3 0.15.0 python-slugify 4.0.1 python3-openid 3.2.0 pytz 2021.3 pytzdata 2020.1 PyYAML 6.0 readme-renderer 32.0 redis 3.5.3 requests 2.27.1 requests-oauthlib 1.3.1 requests-toolbelt 0.9.1 rfc3986 1.5.0 rich 11.2.0 rsa 4.8 SecretStorage 3.3.1 semantic-version 2.9.0 setproctitle 1.2.2 setuptools 59.0.1 setuptools-rust 1.1.2 six 1.16.0 sniffio 1.2.0 SQLAlchemy 1.4.31 SQLAlchemy-JSONField 1.0.0 SQLAlchemy-Utils 0.38.2 swagger-ui-bundle 0.0.9 tables 3.6.1 tabulate 0.8.9 tenacity 8.0.1 termcolor 1.1.0 text-unidecode 1.3 tomli 2.0.1 tornado 6.1 tqdm 4.62.3 twine 3.8.0 typing_extensions 4.1.1 unicodecsv 0.14.1 urllib3 1.26.8 vine 5.0.0 wcwidth 0.2.5 webencodings 0.5.1 websocket 0.2.1 websocket-client 1.2.3 Werkzeug 1.0.1 wheel 0.37.0 WTForms 2.3.3 zipp 3.7.0 zope.event 4.5.0 zope.interface 5.4.0
Deployment
Docker-Compose
Deployment details
I have three machines Machine A: run airflow webserver and scheduler B and C machines: run celery worker All operators run on the k8s cluster through k8s_config
Anything else
No response
Are you willing to submit PR?
Code of Conduct