apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.38k stars 14.1k forks source link

Can't disable sqlalchemy pool #24176

Closed dreamca4er closed 2 years ago

dreamca4er commented 2 years ago

Apache Airflow version

2.3.1 (latest released)

What happened

We are using airflow + mysql databse + rabbitmq celery broker. Sqlalchemy connection pool is disabled. After upgrading from Airflow 2.2.3 to 2.3.1 we are getting an error during Airflow web interface usage:

Python version: 3.7.13
Airflow version: 2.3.1
Node: %HOST%
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 1008, in __call__
    return self.registry[key]
KeyError: <greenlet.greenlet object at 0x7f18c5080a10 (otid=0x7f18c62669b0) current active started main>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask/app.py", line 2448, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask/app.py", line 1953, in full_dispatch_request
    return self.finalize_request(rv)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask/app.py", line 1970, in finalize_request
    response = self.process_response(response)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask/app.py", line 2269, in process_response
    self.session_interface.save_session(self, ctx.session, response)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/airflow/www/session.py", line 33, in save_session
    return super().save_session(*args, **kwargs)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_session/sessions.py", line 554, in save_session
    saved_session = self.sql_session_model.query.filter_by(
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 552, in __get__
    return type.query_class(mapper, session=self.sa.session())
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/orm/scoping.py", line 129, in __call__
    return self.registry()
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/util/_collections.py", line 1010, in __call__
    return self.registry.setdefault(key, self.createfunc())
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 4058, in __call__
    return self.class_(**local_kw)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 176, in __init__
    bind = options.pop('bind', None) or db.engine
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 1000, in engine
    return self.get_engine()
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 1019, in get_engine
    return connector.get_engine()
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 596, in get_engine
    self._engine = rv = self._sa.create_engine(sa_url, options)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/flask_sqlalchemy/__init__.py", line 1029, in create_engine
    return sqlalchemy.create_engine(sa_url, **engine_opts)
  File "<string>", line 2, in create_engine
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned
    return fn(*args, **kwargs)
  File "/home/%WORK_DIR%/venv/lib/python3.7/site-packages/sqlalchemy/engine/create.py", line 646, in create_engine
    engineclass.__name__,
TypeError: Invalid argument(s) 'pool_size' sent to create_engine(), using configuration MySQLDialect_mysqldb/NullPool/Engine.  Please check that the keyword arguments are appropriate for this combination of components.

After changing env AIRFLOWDATABASESQL_ALCHEMY_POOL_ENABLED (or airflow.cfg parameter sql_alchemy_pool_enabled in case of configs set not via envs) from False to True error becomes resolved.

What you think should happen instead

I think that we should still be able to disable sqlalchemy pooling if there is such an option. Besides, it works in Airflow 2.2.3. Somehow pool_size option gets passed to sqlaclhemy create_engine method, even when pooling is disabled via setting environment variable.

How to reproduce

Here our airflow.cfg (without sensitive info) airflow.cfg.zip Here is our pip freeze output

aiohttp==3.8.1
aiosignal==1.2.0
alembic==1.8.0
amqp==5.1.1
anyio==3.6.1
apache-airflow==2.3.1
apache-airflow-providers-cncf-kubernetes==4.0.2
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-sqlite==2.1.3
apispec==3.3.2
argcomplete==1.12.3
async-timeout==4.0.2
asynctest==0.13.0
attrs==20.3.0
Babel==2.10.1
backports.zoneinfo==0.2.1
bcrypt==3.2.2
billiard==3.6.4.0
bingads==13.0.11.1
bleach==5.0.0
blinker==1.4
boto==2.49.0
cached-property==1.5.2
cachelib==0.7.0
cachetools==4.2.4
cattrs==1.5.0
celery==5.2.1
certifi==2022.5.18.1
cffi==1.15.0
charset-normalizer==2.0.12
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
clickclick==20.10.2
clickhouse-driver==0.2.3
colorama==0.4.4
colorlog==4.8.0
commonmark==0.9.1
connexion==2.13.1
console-menu==0.6.0
cron-descriptor==1.2.24
croniter==1.0.15
cryptography==37.0.2
curlify==2.2.1
defusedxml==0.7.1
Deprecated==1.2.13
dill==0.3.5.1
dnspython==2.2.1
docutils==0.16
email-validator==1.2.1
facebook-business==13.0.0
Flask==1.1.2
Flask-AppBuilder==3.4.5
Flask-Babel==2.0.0
Flask-Bcrypt==1.0.1
Flask-Caching==1.11.1
Flask-JWT-Extended==3.25.1
Flask-Login==0.4.1
Flask-OpenID==1.3.0
Flask-Session==0.4.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.14.3
flower==1.0.0
frozenlist==1.3.0
future==0.18.2
gcs-oauth2-boto-plugin==2.5
google-ads==16.0.0
google-api-core==2.7.1
google-api-python-client==2.40.0
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.5.1
google-cloud-bigquery==2.31.0
google-cloud-core==2.2.3
google-cloud-storage==1.43.0
google-crc32c==1.3.0
google-reauth==0.1.1
google-resumable-media==2.3.3
googleads==22.0.0
googleapis-common-protos==1.56.2
graphviz==0.20
greenlet==1.1.2
grpcio==1.46.3
grpcio-status==1.46.3
gunicorn==20.1.0
guppy3==3.1.2
h11==0.12.0
httpcore==0.13.7
httplib2==0.20.4
httpx==0.19.0
humanize==3.13.1
idna==3.3
importlib-metadata==4.11.4
importlib-resources==5.7.1
inflection==0.5.1
iso8601==1.0.2
isodate==0.6.1
itsdangerous==1.1.0
Jinja2==3.0.3
joblib==1.1.0
jsonschema==3.2.0
kombu==5.2.4
kubernetes==23.6.0
lazy-object-proxy==1.7.1
lockfile==0.12.2
lxml==4.9.0
Mako==1.2.0
Markdown==3.3.4
MarkupSafe==2.0.1
marshmallow==3.16.0
marshmallow-enum==1.5.1
marshmallow-oneofschema==3.0.1
marshmallow-sqlalchemy==0.26.1
mixpanel==4.9.0
multidict==6.0.2
mysql-connector==2.2.9
mysqlclient==2.1.0
numpy==1.21.6
oauth2client==4.1.3
oauthlib==3.2.0
openapi-schema-validator==0.2.3
openapi-spec-validator==0.4.0
oyaml==1.0
packaging==21.3
pandas==1.3.5
pathspec==0.9.0
pendulum==2.1.2
pika==1.1.0
platformdirs==2.5.2
pluggy==1.0.0
prison==0.2.1
prometheus-client==0.14.1
prompt-toolkit==3.0.8
proto-plus==1.19.6
protobuf==3.20.0
psutil==5.9.1
psycopg2-binary==2.9.3
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycountry==22.3.5
pycparser==2.21
pycryptodome==3.12.0
pyflakes==2.4.0
Pygments==2.12.0
PyJWT==1.7.1
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyperclip==1.8.2
pyrsistent==0.18.1
python-daemon==2.3.0
python-dateutil==2.8.2
python-nvd3==0.15.0
python-slugify==6.1.2
python3-openid==3.2.0
pytz==2022.1
pytz-deprecation-shim==0.1.0.post0
pytzdata==2020.1
pyu2f==0.1.5
PyYAML==5.4.1
requests==2.27.1
requests-file==1.5.1
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
retry-decorator==1.1.1
rfc3986==1.5.0
rich==12.4.4
rsa==4.8
scikit-learn==0.24.1
scipy==1.7.3
selenium==3.141.0
setproctitle==1.2.3
six==1.16.0
slackclient==2.5.0
sniffio==1.2.0
SocksiPy-branch==1.1
SQLAlchemy==1.4.9
SQLAlchemy-JSONField==1.0.0
SQLAlchemy-Utils==0.38.2
suds-community==1.1.1
swagger-ui-bundle==0.0.9
tableau-api-lib==0.1.14
tableauhyperapi==0.0.13129
tabulate==0.8.9
tenacity==8.0.1
termcolor==1.1.0
text-unidecode==1.3
threadpoolctl==3.1.0
tornado==6.1
typeguard==2.13.3
typing_extensions==4.2.0
tzdata==2022.1
tzlocal==4.2
ua-parser==0.10.0
unicodecsv==0.14.1
Unidecode==1.3.2
uritemplate==4.1.1
urllib3==1.26.9
user-agents==2.0
vine==5.0.0
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.3.2
Werkzeug==1.0.1
wrapt==1.14.1
WTForms==2.3.3
xgboost==1.4.0
xmltodict==0.13.0
yarl==1.7.2
zeep==4.1.0
zenpy==2.0.22
zipp==3.8.0

To reproduce:

  1. Create mysql database for Airflow.
  2. Change %PLACEHOLDERS% in the attached airflow.cfg to valid db connect details and paths. Place it in ~/airflow/.
  3. Create python3.7 venv (or equivalent) with the attached pip requirements.
  4. Run airflow webserver.
  5. Go to Airflow web interface and experience an error before credentials prompt.

To fix:

  1. Change sql_alchemy_pool_enabled parameter in airflow.cfg to True.

Operating System

Ubuntu 20.04.4 LTS

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==4.0.2
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-sqlite==2.1.3

Deployment

Other Docker-based deployment

Deployment details

Mysql database, rabbitmq celery broker. Deployment type doesn't matter for this error: it reproduces on fully deployed stage (Airflow components running in containers built from our own Dockerfile) and on developers laptop, with venv and airflow webserver command.

Anything else

After further investigation we've found that create_engine receives following options:

{
  'pool_size': 10, 
  'pool_recycle': 7200, 
  'poolclass': <class 'sqlalchemy.pool.impl.NullPool'>, 
  'isolation_level': 'READ COMMITTED', 
  'encoding': 'utf-8'
}

These pool_size and pool_recycle weren't set by us, so they must have come from some default values. It seems than an error occurs during create_app function: airflow/www/app.py:71. And that pool_size parameter comes from apply_driver_hacks method of SQLAlchemy class: flask_sqlalchemy/__init__.py:937

Are you willing to submit PR?

Code of Conduct

potiuk commented 2 years ago

Please raise it to flask_sqlalchemy we cannot do much about it I am afraid.

Converting it into discussion - would be great to know when you open the issue and what results you get from that.