apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.31k stars 14.09k forks source link

Google OAuth access token issue - mismatching_state: CSRF Warning! #21065

Closed alexkruWix closed 2 years ago

alexkruWix commented 2 years ago

Apache Airflow version

2.2.3 (latest released)

What happened

When installing the new version of Airflow in our organization, and connecting it to Google OAuth, we encounter a problem after the initial setup - While the scheduler is working fine, we can not login into Airflow.

Once we select the Google account, we are redirected back to the login screen, and can not go into the UI. In the debug logs, we see the following:

[2022-01-19 12:15:34,035] {views.py:615} DEBUG - Provider: google
[2022-01-19 12:15:34,035] {views.py:615} DEBUG - Provider: google
[2022-01-19 12:15:34,036] {views.py:628} DEBUG - Going to call authorize for: google
[2022-01-19 12:15:34,036] {views.py:628} DEBUG - Going to call authorize for: google
[2022-01-19 12:15:34,036] {base_app.py:155} DEBUG - Saving authorize data: <JSON_AUTHORIZE_DATA>
[2022-01-19 12:15:36,389] {views.py:658} DEBUG - Authorized init
[2022-01-19 12:15:36,389] {views.py:658} DEBUG - Authorized init
[2022-01-19 12:15:36,389] {views.py:666} ERROR - Error authorizing OAuth access token: mismatching_state: CSRF Warning! State not equal in request and response.
[2022-01-19 12:15:36,389] {views.py:666} ERROR - Error authorizing OAuth access token: mismatching_state: CSRF Warning! State not equal in request and response.
[2022-01-19 12:15:36,781] {views.py:615} DEBUG - Provider: None
[2022-01-19 12:15:36,781] {views.py:615} DEBUG - Provider: None

So far we tried:

But nothing seems to work. We are using the following webserver_config.py:

"""Default configuration for the Airflow webserver"""
import os

from airflow import configuration as conf
#from airflow.www.fab_security.manager import AUTH_DB

# from airflow.www.fab_security.manager import AUTH_LDAP
from airflow.www.fab_security.manager import AUTH_OAUTH
# from airflow.www.fab_security.manager import AUTH_OID
# from airflow.www.fab_security.manager import AUTH_REMOTE_USER

basedir = os.path.abspath(os.path.dirname(__file__))

#Note sure if this should stay - it doesnt exist in airflow 2?
SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True

# ----------------------------------------------------
# AUTHENTICATION CONFIG
# ----------------------------------------------------
# For details on how to set up each of the following authentication, see
# http://flask-appbuilder.readthedocs.io/en/latest/security.html# authentication-methods
# for details.

# The authentication type
# AUTH_OID : Is for OpenID
# AUTH_DB : Is for database
# AUTH_LDAP : Is for LDAP
# AUTH_REMOTE_USER : Is for using REMOTE_USER from web server
# AUTH_OAUTH : Is for OAuth
AUTH_TYPE = AUTH_OAUTH

# Uncomment to setup Full admin role name
AUTH_ROLE_ADMIN = 'Admin'

# Uncomment and set to desired role to enable access without authentication
# AUTH_ROLE_PUBLIC = 'Viewer'

# Will allow user self registration
AUTH_USER_REGISTRATION = True

# The recaptcha it's automatically enabled for user self registration is active and the keys are necessary
# RECAPTCHA_PRIVATE_KEY = PRIVATE_KEY
# RECAPTCHA_PUBLIC_KEY = PUBLIC_KEY

# Config for Flask-Mail necessary for user self registration
# MAIL_SERVER = 'smtp.gmail.com'
# MAIL_USE_TLS = True
# MAIL_USERNAME = 'yourappemail@gmail.com'
# MAIL_PASSWORD = 'passwordformail'
# MAIL_DEFAULT_SENDER = 'sender@gmail.com'

# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Admin"

# When using OAuth Auth, uncomment to setup provider(s) info
# Google OAuth example:
OAUTH_PROVIDERS = [{
  'name':'google',
    'token_key':'access_token',
    'icon':'fa-google',
        'remote_app': {
            'api_base_url':'https://www.googleapis.com/oauth2/v2/',
            'client_kwargs':{
                'scope': 'email profile'
            },
            'access_token_url':'https://accounts.google.com/o/oauth2/token',
            'authorize_url':'https://accounts.google.com/o/oauth2/auth',
            'request_token_url': None,
            'client_id': '<MY_CLIENT_ID>',
            'client_secret': '<MY_CLIENT_SECRET>',
        }
}]

Will appreciate any help for this issue.

What you expected to happen

Being able to connect to the web UI, and seeing the DAGs and everything.

How to reproduce

No response

Operating System

Debian GNU/Linux 10 (buster)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==2.6.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==3.0.1
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.3.0
apache-airflow-providers-http==2.0.2
apache-airflow-providers-imap==2.1.0
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sqlite==2.0.1

Deployment

Other Docker-based deployment

Deployment details

Using a custom Docker that is built from python:3.7-buster. Installing Airflow is done through pip install. Our pip freeze is:

alembic==1.7.5
amqp==5.0.9
anyio==3.5.0
apache-airflow==2.2.3
apache-airflow-providers-amazon==2.6.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==3.0.1
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.3.0
apache-airflow-providers-http==2.0.2
apache-airflow-providers-imap==2.1.0
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sqlite==2.0.1
apispec==3.3.2
argcomplete==1.12.3
asn1crypto==1.4.0
attrs==20.3.0
Authlib==0.15.5
Babel==2.9.1
bcrypt==3.2.0
beautifulsoup4==4.10.0
billiard==3.6.4.0
blinker==1.4
boto3==1.18.65
botocore==1.21.65
cached-property==1.5.2
cachetools==4.2.4
cattrs==1.5.0
celery==5.1.2
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.10
click==7.1.2
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
clickclick==20.10.2
colorama==0.4.4
colorlog==4.8.0
commonmark==0.9.1
croniter==1.0.15
cryptography==36.0.1
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.4
distlib==0.3.4
dnspython==2.2.0
docutils==0.16
email-validator==1.1.3
filelock==3.4.2
Flask==1.1.4
Flask-AppBuilder==3.4.3
Flask-Babel==2.0.0
Flask-Bcrypt==0.7.1
Flask-Caching==1.10.1
Flask-JWT-Extended==3.25.1
Flask-Login==0.4.1
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.5.1
flask-talisman==0.8.1
Flask-WTF==0.14.3
flower==1.0.0
google-ads==14.0.0
google-api-core==1.31.5
google-api-python-client==1.12.10
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-appengine-logging==1.1.0
google-cloud-audit-log==0.2.0
google-cloud-automl==2.6.0
google-cloud-bigquery==2.32.0
google-cloud-bigquery-datatransfer==3.5.0
google-cloud-bigquery-storage==2.11.0
google-cloud-bigtable==1.7.0
google-cloud-build==3.7.1
google-cloud-container==1.0.1
google-cloud-core==1.7.2
google-cloud-datacatalog==3.6.2
google-cloud-dataproc==3.2.0
google-cloud-dataproc-metastore==1.3.1
google-cloud-dlp==1.0.0
google-cloud-kms==2.10.1
google-cloud-language==1.3.0
google-cloud-logging==2.7.0
google-cloud-memcache==1.0.0
google-cloud-monitoring==2.8.0
google-cloud-os-login==2.5.1
google-cloud-pubsub==2.9.0
google-cloud-redis==2.5.1
google-cloud-secret-manager==1.0.0
google-cloud-spanner==1.19.1
google-cloud-speech==1.3.2
google-cloud-storage==1.44.0
google-cloud-tasks==2.7.2
google-cloud-texttospeech==1.0.1
google-cloud-translate==1.7.0
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-cloud-workflows==1.5.0
google-crc32c==1.3.0
google-resumable-media==1.3.3
googleapis-common-protos==1.54.0
graphviz==0.19.1
grpc-google-iam-v1==0.12.3
grpcio==1.43.0
grpcio-gcp==0.2.2
gunicorn==20.1.0
h11==0.12.0
httpcore==0.13.7
httplib2==0.20.2
httpx==0.19.0
humanize==3.13.1
idna==3.3
importlib-metadata==4.10.1
importlib-resources==5.4.0
inflection==0.5.1
iso8601==1.0.2
isodate==0.6.1
itsdangerous==1.1.0
Jinja2==2.11.3
jmespath==0.10.0
json-merge-patch==0.2
jsonpath-ng==1.5.3
jsonschema==3.2.0
kombu==5.2.3
kubernetes==21.7.0
lazy-object-proxy==1.7.1
libcst==0.4.0
lockfile==0.12.2
lxml==4.7.1
Mako==1.1.6
Markdown==3.3.6
MarkupSafe==2.0.1
marshmallow==3.14.1
marshmallow-enum==1.5.1
marshmallow-oneofschema==3.0.1
marshmallow-sqlalchemy==0.26.1
mypy-extensions==0.4.3
mysql-connector-python==8.0.28
mysqlclient==2.1.0
ndg-httpsclient==0.5.1
nox==2020.12.31
numpy==1.21.5
oauthlib==3.1.1
openapi-schema-validator==0.1.6
openapi-spec-validator==0.3.2
packaging==21.3
pandas==1.3.5
pandas-gbq==0.14.1
pendulum==2.1.2
platformdirs==2.4.1
ply==3.11
prison==0.2.1
prometheus-client==0.12.0
prompt-toolkit==3.0.24
proto-plus==1.18.1
protobuf==3.19.3
psutil==5.9.0
py==1.11.0
pyarrow==6.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydata-google-auth==1.3.0
Pygments==2.11.2
PyJWT==1.7.1
pyOpenSSL==21.0.0
pyparsing==3.0.7
pyrsistent==0.16.1
python-daemon==2.3.0
python-dateutil==2.8.2
python-nvd3==0.15.0
python-slugify==4.0.1
python3-openid==3.2.0
pytz==2021.3
pytzdata==2020.1
PyYAML==5.4.1
redis==3.5.3
redshift-connector==2.0.903
requests==2.27.1
requests-oauthlib==1.3.0
rfc3986==1.5.0
rich==11.0.0
rsa==4.8
s3transfer==0.5.0
scramp==1.4.1
setproctitle==1.2.2
six==1.16.0
sniffio==1.2.0
soupsieve==2.3.1
SQLAlchemy==1.3.24
SQLAlchemy-JSONField==1.0.0
sqlalchemy-redshift==0.8.9
SQLAlchemy-Utils==0.38.2
swagger-ui-bundle==0.0.9
tabulate==0.8.9
tenacity==8.0.1
termcolor==1.1.0
text-unidecode==1.3
tornado==6.1
typing-inspect==0.7.1
typing_extensions==4.0.1
unicodecsv==0.14.1
uritemplate==3.0.1
urllib3==1.26.8
vine==5.0.0
virtualenv==20.13.0
watchtower==2.0.1
wcwidth==0.2.5
websocket-client==1.2.3
Werkzeug==1.0.1
WTForms==2.3.3
zipp==3.7.0

This docker image is running on K8s (using an in-house system), and we are getting a domain pointing to the deployment pods.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 2 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

potiuk commented 2 years ago

This is likely something wrong in in your k8s setup. This is not oauth issue IMHO. You pribably use "random" to generate SECRET_KEY - and you haave different secret keys in your UI and workers. You have to make sure that the secret key is the same everywhere. I heartily recommend to use Airflow Helm Chart https://airflow.apache.org/docs/helm-chart/stable/index.html and Production image: https://airflow.apache.org/docs/docker-stack/index.html instead of running your custom image / K8S deployment as many of similar problems with configurations are sorted out there, vetted by multiple people and tested.

I am turning that into discussion in case it does not help and in case you want to provide more information (maybe my guess is incorrect).