airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
662 stars 475 forks source link

install-pip-packages contrainer fails to start (airflow 2.0.1) #178

Closed klalafaryan closed 3 years ago

klalafaryan commented 3 years ago

What is the bug?

web:
  extraPipPackages:
    - "apache-airflow[google_auth,jdbc,kubernetes,postgres,s3,ssh]==2.0.1"

With above extraPipPackages, install-pip-packages container fails with following exception:

.
.
.

Successfully built flask-login python-slugify termcolor unicodecsv python-nvd3 Flask-OAuthlib pysftp Flask-OpenID Flask-JWT-Extended pyrsistent
Installing collected packages: zipp, importlib-metadata, werkzeug, markupsafe, jinja2, click, itsdangerous, flask, flask-caching, pygments, six, python-dateutil, natsort, croniter, pyjwt, iso8601, lockfile, defusedxml, python3-openid, cached-property, importlib-resources, attrs, typing-extensions, colorama, commonmark, rich, greenlet, sqlalchemy, Mako, python-editor, alembic, docutils, setuptools, python-daemon, WTForms, flask-wtf, apache-airflow-providers-http, marshmallow, marshmallow-sqlalchemy, marshmallow-enum, PyYAML, apispec, pyrsistent, jsonschema, Flask-OpenID, idna, dnspython, email-validator, flask-login, Flask-JWT-Extended, sqlalchemy-utils, pytz, Babel, Flask-Babel, prison, Flask-SQLAlchemy, flask-appbuilder, sqlalchemy-jsonfield, apache-airflow-providers-ftp, colorlog, lazy-object-proxy, apache-airflow-providers-imap, pytzdata, pendulum, setproctitle, tenacity, cattrs, gunicorn, text-unidecode, python-slugify, psutil, marshmallow-oneofschema, certifi, urllib3, chardet, requests, tabulate, argcomplete, termcolor, numpy, pandas, graphviz, markdown, dill, unicodecsv, pycparser, cffi, cryptography, apache-airflow-providers-sqlite, inflection, clickclick, isodate, openapi-schema-validator, openapi-spec-validator, swagger-ui-bundle, connexion, python-nvd3, oauthlib, requests-oauthlib, Flask-OAuthlib, JPype1, jaydebeapi, apache-airflow-providers-jdbc, pyasn1, pyasn1-modules, rsa, cachetools, google-auth, websocket-client, kubernetes, apache-airflow-providers-cncf-kubernetes, psycopg2-binary, apache-airflow-providers-postgres, jmespath, botocore, s3transfer, boto3, watchtower, apache-airflow-providers-amazon, bcrypt, pynacl, paramiko, sshtunnel, pysftp, apache-airflow-providers-ssh, apache-airflow-providers-sftp, apache-airflow
Successfully installed Babel-2.9.0 Flask-Babel-1.0.0 Flask-JWT-Extended-3.25.1 Flask-OAuthlib-0.9.5 Flask-OpenID-1.2.5 Flask-SQLAlchemy-2.5.1 JPype1-1.2.1 Mako-1.1.4 PyYAML-5.4.1 WTForms-2.3.3 alembic-1.5.8 apache-airflow-2.0.1 apache-airflow-providers-amazon-1.3.0 apache-airflow-providers-cncf-kubernetes-1.1.0 apache-airflow-providers-ftp-1.0.1 apache-airflow-providers-http-1.1.1 apache-airflow-providers-imap-1.0.1 apache-airflow-providers-jdbc-1.0.1 apache-airflow-providers-postgres-1.0.1 apache-airflow-providers-sftp-1.1.1 apache-airflow-providers-sqlite-1.0.2 apache-airflow-providers-ssh-1.3.0 apispec-3.3.2 argcomplete-1.12.3 attrs-20.3.0 bcrypt-3.2.0 boto3-1.15.18 botocore-1.18.18 cached-property-1.5.2 cachetools-4.2.1 cattrs-1.5.0 certifi-2020.12.5 cffi-1.14.5 chardet-4.0.0 click-7.1.2 clickclick-20.10.2 colorama-0.4.4 colorlog-5.0.1 commonmark-0.9.1 connexion-2.7.0 croniter-0.3.37 cryptography-3.4.7 defusedxml-0.7.1 dill-0.3.3 dnspython-2.1.0 docutils-0.17.1 email-validator-1.1.2 flask-1.1.2 flask-appbuilder-3.1.1 flask-caching-1.10.1 flask-login-0.4.1 flask-wtf-0.14.3 google-auth-1.29.0 graphviz-0.16 greenlet-1.0.0 gunicorn-19.10.0 idna-2.10 importlib-metadata-1.7.0 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.14 isodate-0.6.0 itsdangerous-1.1.0 jaydebeapi-1.2.3 jinja2-2.11.3 jmespath-0.10.0 jsonschema-3.2.0 kubernetes-11.0.0 lazy-object-proxy-1.6.0 lockfile-0.12.2 markdown-3.3.4 markupsafe-1.1.1 marshmallow-3.11.1 marshmallow-enum-1.5.1 marshmallow-oneofschema-2.1.0 marshmallow-sqlalchemy-0.23.1 natsort-7.1.1 numpy-1.20.2 oauthlib-2.1.0 openapi-schema-validator-0.1.5 openapi-spec-validator-0.3.0 pandas-1.2.4 paramiko-2.7.2 pendulum-2.1.2 prison-0.1.3 psutil-5.8.0 psycopg2-binary-2.8.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.20 pygments-2.8.1 pyjwt-1.7.1 pynacl-1.4.0 pyrsistent-0.17.3 pysftp-0.2.9 python-daemon-2.3.0 python-dateutil-2.8.1 python-editor-1.0.4 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytz-2021.1 pytzdata-2020.1 requests-2.25.1 requests-oauthlib-1.1.0 rich-9.2.0 rsa-4.7.2 s3transfer-0.3.7 setproctitle-1.2.2 setuptools-56.0.0 six-1.15.0 sqlalchemy-1.4.11 sqlalchemy-jsonfield-1.0.0 sqlalchemy-utils-0.37.0 sshtunnel-0.1.5 swagger-ui-bundle-0.0.8 tabulate-0.8.9 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 typing-extensions-3.7.4.3 unicodecsv-0.14.1 urllib3-1.26.4 watchtower-0.7.3 websocket-client-0.58.0 werkzeug-1.0.1 zipp-3.4.1
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

eventlet 0.30.1 requires dnspython<2.0.0,>=1.15.0, but you'll have dnspython 2.1.0 which is incompatible.
botocore 1.18.18 requires urllib3<1.26,>=1.20; python_version != "3.4", but you'll have urllib3 1.26.4 which is incompatible.
aiohttp 3.7.3 requires chardet<4.0,>=2.0, but you'll have chardet 4.0.0 which is incompatible.
WARNING: You are using pip version 20.2.4; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.

What is your Kubernetes Version?:

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.6", GitCommit:"fbf646b339dc52336b55d8ec85c181981b86331a", GitTreeState:"clean", BuildDate:"2020-12-18T12:01:36Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

What is your Helm version?:

$ helm version
version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"dirty", GoVersion:"go1.15.5"}
thesuperzapper commented 3 years ago

This is very likely a duplicate of https://github.com/airflow-helm/charts/issues/169

klalafaryan commented 3 years ago

@thesuperzapper To be honest, I am not sure if it is related to #169.

The problem here is with 2.0.1 image, and it pip install requires --use-feature=2020-resolver which is currently missing.

Did I miss anything ?

thesuperzapper commented 3 years ago

I dont think thats an error (just a notice from pip that things might fail in future versions), please copy the logs corresponding to the actual failure.

klalafaryan commented 3 years ago

Here are the logs, airflow-web pod is failing:

kubectl logs -f pods/airflow-web-7d675f8c6f-lwhjx -n airflow airflow-web
/opt/python/site-packages/airflow/configuration.py:332 DeprecationWarning: The logging_level option in [core] has been moved to the logging_level option in [logging] - the old setting has been used, but please update your config.
/opt/python/site-packages/airflow/configuration.py:332 DeprecationWarning: The remote_logging option in [core] has been moved to the remote_logging option in [logging] - the old setting has been used, but please update your config.
/opt/python/site-packages/airflow/configuration.py:332 DeprecationWarning: The remote_base_log_folder option in [core] has been moved to the remote_base_log_folder option in [logging] - the old setting has been used, but please update your config.
/opt/python/site-packages/sqlalchemy/orm/relationships.py:3441 SAWarning: relationship 'DagRun.serialized_dag' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id), 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning. (Background on this error at: http://sqlalche.me/e/14/qzyx)
/opt/python/site-packages/sqlalchemy/orm/relationships.py:3441 SAWarning: relationship 'SerializedDagModel.dag_runs' will copy column serialized_dag.dag_id to column dag_run.dag_id, which conflicts with relationship(s): 'TaskInstance.dag_run' (copies task_instance.dag_id to dag_run.dag_id), 'DagRun.task_instances' (copies task_instance.dag_id to dag_run.dag_id). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   The 'overlaps' parameter may be used to remove this warning. (Background on this error at: http://sqlalche.me/e/14/qzyx)
/home/airflow/.local/lib/python3.8/site-packages/azure/cosmos/session.py:186 SyntaxWarning: "is not" with a literal. Did you mean "!="?
[2021-04-22 12:46:09,330] {providers_manager.py:295} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-04-22 12:46:09,982] {providers_manager.py:295} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2021-04-22 12:46:10,230] {dagbag.py:448} INFO - Filling up the DagBag from /dev/null
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/opt/python/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/opt/python/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/opt/python/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/opt/python/site-packages/airflow/cli/commands/webserver_command.py", line 360, in webserver
    app = cached_app(None)
  File "/opt/python/site-packages/airflow/www/app.py", line 135, in cached_app
    app = create_app(config=config, testing=testing)
  File "/opt/python/site-packages/airflow/www/app.py", line 113, in create_app
    init_appbuilder(flask_app)
  File "/opt/python/site-packages/airflow/www/extensions/init_appbuilder.py", line 46, in init_appbuilder
    AirflowAppBuilder(
  File "/opt/python/site-packages/flask_appbuilder/base.py", line 148, in __init__
    self.init_app(app, session)
  File "/opt/python/site-packages/flask_appbuilder/base.py", line 202, in init_app
    self.sm = self.security_manager_class(self)
  File "/opt/python/site-packages/airflow/www/security.py", line 160, in __init__
    super().__init__(appbuilder)
  File "/opt/python/site-packages/flask_appbuilder/security/sqla/manager.py", line 51, in __init__
    super(SecurityManager, self).__init__(appbuilder)
  File "/opt/python/site-packages/flask_appbuilder/security/manager.py", line 250, in __init__
    from authlib.integrations.flask_client import OAuth
ModuleNotFoundError: No module named 'authlib'
klalafaryan commented 3 years ago

@thesuperzapper Any ideas ?

klalafaryan commented 3 years ago

@thesuperzapper I have fixed the problem with the following:

  extraPipPackages:
    ## the following configs require Flask-AppBuilder 3.2.0 (or later)
    - "Flask-AppBuilder~=3.2.0"
    ## the following configs require Authlib
    - "Authlib~=0.15.3"
    - "apache-airflow[google_auth,jdbc,kubernetes,postgres,s3,ssh,databricks]==2.0.1"

I think we can close the issue since this is not related to the helm chart at all, but I think the issue should be fixed in apache-airflow[google_auth].

thesuperzapper commented 3 years ago

@klalafaryan do you want to raise it on the apache/airflow repo, if it's not already there?

klalafaryan commented 3 years ago

@thesuperzapper created an issue in the apache/airflow project.

potiuk commented 3 years ago

@thesuperzapper I have fixed the problem with the following:

  extraPipPackages:
    ## the following configs require Flask-AppBuilder 3.2.0 (or later)
    - "Flask-AppBuilder~=3.2.0"
    ## the following configs require Authlib
    - "Authlib~=0.15.3"
    - "apache-airflow[google_auth,jdbc,kubernetes,postgres,s3,ssh,databricks]==2.0.1"

I think we can close the issue since this is not related to the helm chart at all, but I think the issue should be fixed in apache-airflow[google_auth].

It is related to helm chart and the way how extraPackages are added. When Airflow 2.0.2 got released, the way how extraPackages are used, caused the airflow to be reinstalled in a different site-package directory and it "overrides" the original location of Airlfow. So what you end-up with in the chart is an image with two versions of airflow installed in two different locations. This is chart problem. not airflow. I even commented on how this can be fixed, following the exact way how Airfow image is built: https://github.com/airflow-helm/charts/issues/169#issuecomment-824955814

klalafaryan commented 3 years ago

@thesuperzapper Any ideas ?

thesuperzapper commented 3 years ago

@klalafaryan can you confirm if your issue is fixed after version 8.0.9 of the chart?

klalafaryan commented 3 years ago

@thesuperzapper I have tried 8.1.0 version, and it is not fixed. Still I get following exception:

[2021-05-11 15:31:53,625] {providers_manager.py:295} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
[2021-05-11 15:31:54,014] {providers_manager.py:295} WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2021-05-11 15:31:54,377] {dagbag.py:448} INFO - Filling up the DagBag from /dev/null
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/webserver_command.py", line 360, in webserver
    app = cached_app(None)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/app.py", line 135, in cached_app
    app = create_app(config=config, testing=testing)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/app.py", line 113, in create_app
    init_appbuilder(flask_app)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/extensions/init_appbuilder.py", line 46, in init_appbuilder
    AirflowAppBuilder(
  File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/base.py", line 148, in __init__
    self.init_app(app, session)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/base.py", line 202, in init_app
    self.sm = self.security_manager_class(self)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/security.py", line 160, in __init__
    super().__init__(appbuilder)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/sqla/manager.py", line 51, in __init__
    super(SecurityManager, self).__init__(appbuilder)
  File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/manager.py", line 250, in __init__
    from authlib.integrations.flask_client import OAuth
ModuleNotFoundError: No module named 'authlib'
thesuperzapper commented 3 years ago

@klalafaryan the issue is because you are not installing the Authlib python package, please review the docs for OAUTH: https://github.com/airflow-helm/charts/tree/main/charts/airflow#how-to-authenticate-airflow-users-with-ldapoauth

thesuperzapper commented 3 years ago

@klalafaryan did you resolve your issue?

rashi-psg commented 3 years ago

Facing the same issue with chart 8.4.0. I am using a custom image based on apache/airflow:2.0.2-python3.8 and installed the required packages inside it. Have authlib also in the image. But the webserver pod is failing with the error

[2021-07-12 07:31:13,978] {dagbag.py:451} INFO - Filling up the DagBag from /dev/null Traceback (most recent call last): File "/home/airflow/.local/bin/airflow", line 8, in sys.exit(main()) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/main.py", line 40, in main args.func(args) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, *kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 89, in wrapper return f(args, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/webserver_command.py", line 360, in webserver app = cached_app(None) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/app.py", line 145, in cached_app app = create_app(config=config, testing=testing) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/app.py", line 123, in create_app init_appbuilder(flask_app) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/extensions/init_appbuilder.py", line 46, in init_appbuilder AirflowAppBuilder( File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/base.py", line 148, in init self.init_app(app, session) File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/base.py", line 202, in init_app self.sm = self.security_manager_class(self) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/www/security.py", line 161, in init super().init(appbuilder) File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/sqla/manager.py", line 52, in init super(SecurityManager, self).init(appbuilder) File "/home/airflow/.local/lib/python3.8/site-packages/flask_appbuilder/security/manager.py", line 256, in init from authlib.integrations.flask_client import OAuth ModuleNotFoundError: No module named 'authlib'

On adding Authlib to extraPipPackages is giving the same error.

thesuperzapper commented 3 years ago

@rashi-psg, what container is that (webserver, git-sync, etc)?

Also, what packages are you installing?

rashi-psg commented 3 years ago

Error is in webserver container. Installing the following because it is somehow not picking the packages present in image extraPipPackages:

thesuperzapper commented 3 years ago

@rashi-psg you mention that you are already installing Authlib in your container image, so I have a few questions:

  1. are you sure that this container image has authlib?
    • (try docker run -it ... it locally, and importing authlib in a python shell)
  2. are you sure that the image is correctly pulled into your K8S.
    • (for example, if you are using pullPolicy: IfNotPresent, make sure you have renamed the image tag, so it knows to pull it again)
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.