apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.44k stars 14.11k forks source link

Keycloak OAuth CSRF mismatch State Not Equal error #34024

Closed andrewzah closed 1 year ago

andrewzah commented 1 year ago

Apache Airflow version

2.7.0

What happened

When trying to log in to airflow with a keycloak 18.0.2 provider using google chrome, I get this error:

Traceback (most recent call last):
  File "/home/kubezt/.local/lib/python3.8/site-packages/flask_appbuilder/security/views.py", line 658, in oauth_authorized
    resp = self.appbuilder.sm.oauth_remotes[provider].authorize_access_token()
  File "/home/kubezt/.local/lib/python3.8/site-packages/authlib/integrations/flask_client/apps.py", line 100, in authorize_access_token
    params = self._format_state_params(state_data, params)
  File "/home/kubezt/.local/lib/python3.8/site-packages/authlib/integrations/base_client/sync_app.py", line 234, in _format_state_params
    raise MismatchingStateError()

authlib.integrations.base_client.errors.MismatchingStateError: mismatching_state: CSRF Warning! State not equal in request and response.

What you think should happen instead

I should be able to log in with the keycloak provider on chrome.

How to reproduce

my webserver_config.py:

import os
import logging
import jwt
import requests

from base64 import b64decode
from cryptography.hazmat.primitives import serialization
from tokenize import Exponent

from airflow import configuration as conf
from flask_appbuilder.security.manager import AUTH_OAUTH
from airflow.www.security import AirflowSecurityManager

from flask_appbuilder import expose
from flask_appbuilder.security.views import AuthOAuthView

basedir = os.path.abspath(os.path.dirname(__file__))
log = logging.getLogger(__name__)

SQLALCHEMY_CONN = os.environ['KC_DB_STRING']
SQLALCHEMY_DATABASE_URI = os.environ['KC_DB_STRING']

CSRF_ENABLED = True
AUTH_TYPE = AUTH_OAUTH

AUTH_ROLE_ADMIN = 'Admin'
AUTH_ROLE_PUBLIC = 'Public'
AUTH_USER_REGISTRATION = True

AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_USER_REGISTRATION_ROLE = 'User'
AUTH_ROLES_MAPPING = {
  "airflow_admin": ["Admin"],
  "airflow_op": ["Op"],
  "airflow_user": ["User"],
  "airflow_viewer": ["Viewer"],
  "airflow_public": ["Public"],
}

PROVIDER_NAME = 'Keycloak SSO'
CLIENT_ID = os.environ['KC_CLIENT_ID']

OAUTH_PROVIDERS = [{
    'name': PROVIDER_NAME,
    'token_key':'access_token',
    'icon':'fa-circle-o',
    'remote_app': {
        'api_base_url': os.environ['KC_CONFIG_URL'],
        'access_token_url': os.environ['KC_TOKEN_URL'],
        'authorize_url': os.environ['KC_AUTH_URL'],
        'userinfo_url': os.environ['KC_USERINFO_URL'],
        'request_token_url': None,
        'client_id': CLIENT_ID,
        'client_secret': os.environ['KC_CLIENT_SECRET'],
        'client_kwargs':{
            'scope': 'email profile'
        },
    }
}]

req = requests.get(os.environ['KC_BASE_REALM_URL'])
key_der_base64 = req.json()["public_key"]
key_der = b64decode(key_der_base64.encode())
public_key = serialization.load_der_public_key(key_der)

class CustomAuthRemoteUserView(AuthOAuthView):
    @expose("/logout/")
    def logout(self):
        """Delete access token before logging out."""
        return super().logout()

class CustomSecurityManager(AirflowSecurityManager):
    authoauthview = CustomAuthRemoteUserView

    def oauth_user_info(self, provider, response):
        if provider == PROVIDER_NAME:
            log.info("response: {}".format(response))
            token = response["access_token"]
            log.info("token: {}".format(token))
            me = jwt.decode(token, public_key, algorithms=['HS256', 'RS256'], audience=CLIENT_ID)
            log.info("me: {}".format(me))
            # sample of resource_access
            # {
            #   "resource_access": { "airflow": { "roles": ["airflow_admin"] }}
            # }
            groups = me.get("resource_access", {}).get("airflow", {}).get("roles")
            if groups is None or len(groups) < 1:
                groups = ["airflow_public"]
            else:
                groups = [str for str in groups if "airflow" in str]

            log.info("making userinfo")
            userinfo = {
                "username": me.get("preferred_username"),
                "email": me.get("email"),
                "first_name": me.get("given_name"),
                "last_name": me.get("family_name"),
                "role_keys": groups,
            }

            log.info("user info: {0}".format(userinfo))
            return userinfo
        else:
            return {}

SECURITY_MANAGER_CLASS = CustomSecurityManager

Operating System

Rocky Linux 8.8

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

$ pip freeze

aiohttp==3.8.5
aiosignal==1.3.1
alembic==1.11.3
amqp==5.1.1
anyio==3.7.1
apache-airflow==2.7.0
apache-airflow-providers-apache-hive==6.1.4
apache-airflow-providers-celery==3.3.2
apache-airflow-providers-common-sql==1.7.0
apache-airflow-providers-ftp==3.5.0
apache-airflow-providers-http==4.5.0
apache-airflow-providers-imap==3.3.0
apache-airflow-providers-jdbc==4.0.1
apache-airflow-providers-mysql==5.2.1
apache-airflow-providers-postgres==5.6.0
apache-airflow-providers-sqlite==3.4.3
apache-airflow-providers-ssh==3.7.1
apispec==6.3.0
argcomplete==3.1.1
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.1.0
Authlib==1.2.1
Babel==2.12.1
backoff==2.2.1
backports.zoneinfo==0.2.1
bcrypt==4.0.1
billiard==4.1.0
blinker==1.6.2
cachelib==0.9.0
cattrs==23.1.2
celery==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
clickclick==20.10.2
colorama==0.4.6
colorlog==4.8.0
ConfigUpdater==3.1.1
connexion==2.14.2
cron-descriptor==1.4.0
croniter==1.4.1
cryptography==41.0.3
Deprecated==1.2.14
dill==0.3.7
dnspython==2.4.2
docutils==0.20.1
email-validator==1.3.1
exceptiongroup==1.1.3
Flask==2.2.5
Flask-AppBuilder==4.3.3
Flask-Babel==2.0.0
Flask-Caching==2.0.2
Flask-JWT-Extended==4.5.2
Flask-Limiter==3.4.1
Flask-Login==0.6.2
Flask-Session==0.5.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==1.1.1
flower==2.0.1
frozenlist==1.4.0
future==0.18.3
google-re2==1.1
googleapis-common-protos==1.60.0
graphviz==0.20.1
greenlet==2.0.2
grpcio==1.57.0
gunicorn==21.2.0
h11==0.14.0
hmsclient==0.1.1
httpcore==0.17.3
httpx==0.24.1
humanize==4.8.0
idna==3.4
importlib-metadata==4.13.0
importlib-resources==6.0.1
inflection==0.5.1
itsdangerous==2.1.2
JayDeBeApi==1.2.3
Jinja2==3.1.2
JPype1==1.4.1
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
kafka-python==2.0.2
kombu==5.3.1
lazy-object-proxy==1.9.0
limits==3.5.0
linkify-it-py==2.0.2
lockfile==0.12.2
Mako==1.2.4
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
marshmallow-oneofschema==3.0.1
marshmallow-sqlalchemy==0.26.1
mdit-py-plugins==0.4.0
mdurl==0.1.2
multidict==6.0.4
mysql-connector-python==8.1.0
mysqlclient==2.2.0
ndg-httpsclient==0.5.1
numpy==1.24.4
opentelemetry-api==1.15.0
opentelemetry-exporter-otlp==1.15.0
opentelemetry-exporter-otlp-proto-grpc==1.15.0
opentelemetry-exporter-otlp-proto-http==1.15.0
opentelemetry-proto==1.15.0
opentelemetry-sdk==1.15.0
opentelemetry-semantic-conventions==0.36b0
ordered-set==4.1.0
packaging==23.1
pandas==2.0.3
paramiko==3.3.1
pathspec==0.11.2
pendulum==2.1.2
pkgutil_resolve_name==1.3.10
pluggy==1.3.0
prison==0.2.1
prometheus-client==0.17.1
prompt-toolkit==3.0.39
protobuf==4.21.12
psutil==5.9.5
psycopg2-binary==2.9.7
pure-sasl==0.6.2
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==1.10.12
Pygments==2.16.1
PyHive==0.7.0
PyJWT==2.8.0
PyNaCl==1.5.0
pyOpenSSL==23.2.0
python-daemon==3.0.1
python-dateutil==2.8.2
python-ldap==3.4.3
python-nvd3==0.15.0
python-slugify==8.0.1
pytz==2023.3
pytzdata==2020.1
PyYAML==6.0.1
redis==5.0.0
referencing==0.30.2
requests==2.31.0
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rich==13.5.2
rich-argparse==1.3.0
rpds-py==0.10.0
sasl==0.3.1
setproctitle==1.3.2
six==1.16.0
sniffio==1.3.0
SQLAlchemy==1.4.49
SQLAlchemy-JSONField==1.0.1.post0
SQLAlchemy-Utils==0.41.1
sqlparse==0.4.4
sshtunnel==0.4.0
tabulate==0.9.0
tenacity==8.2.3
termcolor==2.3.0
text-unidecode==1.3
thrift==0.16.0
thrift-sasl==0.4.3
tornado==6.3.3
typing_extensions==4.7.1
tzdata==2023.3
uc-micro-py==1.0.2
unicodecsv==0.14.1
urllib3==2.0.4
vine==5.0.0
wcwidth==0.2.6
Werkzeug==2.2.3
wrapt==1.15.0
WTForms==3.0.1
yarl==1.9.2
zipp==3.16.2

in airflow.cfg, these changes were made

[webserver]
rbac = True
authenticate = True

and my env vars:

AIRFLOW__WEBSERVER__BASE_URL: 'https://airflow.arpa'
AIRFLOW__WEBSERVER__SECRET_KEY: 'secret_key_here'

Anything else

This only occurs when logging in via Chrome (115.0.5790.110), not Firefox (112.0.1). I seem to be getting a similar issue to authlib 518 and authlib 376.

I looked at airflow issue 28098 and Flask-Appbuilder issue 1957 but I'm not setting AIRFLOW__WEBSERVER__SESSION_LIFETIME_MINUTES.

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

andrewzah commented 1 year ago

See also https://github.com/lepture/authlib/issues/334

andrewzah commented 1 year ago

For anyone running into this, this fixed it. It previously was set to None:

[webserver]
cookie_samesite = Lax