apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.3k forks source link

Authorization with Identity Aware Proxy #11305

Open mik-laj opened 4 years ago

mik-laj commented 4 years ago

Why?

Users expect integration with various Identity Aware Proxies (IAP) that provide authorization. The use of such proxies brings many benefits.

Besides, it can make using LDAP with Airflow much easier. Deficiencies in the implementation of LDAP for Airflow will no longer be a problem for our users e.g. https://github.com/dpgaspar/Flask-AppBuilder/issues/956)

I think we should prepare implementations for some of the most popular products:

This will be an example for other uses for other products as well.

How?

In order to accomplish this task for each supported proxy, we need to prepare two authorization checks - one for Web UI, one for API.

API

Creating your own API auth backend is described in our documentation: https://airflow.readthedocs.io/en/latest/security/api.html#roll-your-own-api-authentication

FAB

Creating an integration with Flask App Builder is a bit worse described, but in our case, we can extend REMOTE_USER to support product-specific headers.

To do this, create a new view based on the flask_appbuilder.security.views.AuthView class, and then set it as an authremoteuserview attribute in the airflow.www.security.AirflowSecurityManager class. You can use the flask_appbuilder.security.views.AuthRemoteUserView class as a template.

Below is a minimal example of the webserver_config.py file (you should save it to ~/airflow/config/) that provide authorizations using the X-Auth-Username header. The goal is to support more vendor-specific headers


from flask import get_flashed_messages, request, redirect, flash
from flask_appbuilder import expose
from flask_appbuilder._compat import as_unicode
from flask_appbuilder.security.views import AuthView
from flask_login import login_user, logout_user

from airflow.www.security import AirflowSecurityManager

class CustomAuthRemoteUserView(AuthView):
    login_template = ""

    @expose("/login/")
    def login(self):
        if g.user is not None and g.user.is_authenticated:
            return redirect(self.appbuilder.get_url_for_index)

        username = request.environ.get("X-Auth-Username")
        if username:
            user = self.appbuilder.sm.auth_user_remote_user(username)
            if user is None:
                flash(as_unicode(self.invalid_login_message), "warning")
            else:
                login_user(user)
        else:
            flash(as_unicode(self.invalid_login_message), "warning")

        # Flush "Access is Denied" flash messaage
        get_flashed_messages()
        return redirect(self.appbuilder.get_url_for_index)

    @expose("/logout/")
    def logout(self):
        logout_user()
        return redirect("/oauth/logout")

class CustomAirflowSecurityManager(AirflowSecurityManager):
    authremoteuserview = CustomAuthRemoteUserView

SECURITY_MANAGER_CLASS = CustomAirflowSecurityManager  # pylint:

Invoking function get_flashed_messages clears the "Access denied" flash message that appears when the user is redirected from / to /login. This is not included with the FAB, but is needed in Airflow.

Vendor headers

In the case of Louketo/Keycloak, we should support the following headers:

In the case of Google IAP, we should use the JWT signed header: https://cloud.google.com/iap/docs/signed-headers-howto In the case of Promerium, we should use the JWT signed header - X-Pomerium-Jwt-Assertion:: https://www.pomerium.io/docs/topics/getting-users-identity.html#prerequisites

Status

Disclaimer

If someone is interested in this task, I will be happy to provide all the necessary information and support.

ap-kulkarni commented 4 years ago

@mik-laj I am interested to work on this.

mik-laj commented 4 years ago

@ameyk-2409 Which task do you want to focus on? We have a few tasks to do here I think this can be broken down into several small contributions..

ap-kulkarni commented 4 years ago

@mik-laj By task do you mean the ones listed under Status section above? If yes, I am interested in implementing authorization for API part. Can be any of the implementations mentioned the description.

mik-laj commented 4 years ago

@ameyk-2409 Fantastic. I assigned you to "API supports Promerium". 🐈

rafaelvargas commented 4 years ago

@mik-laj I'd like to work on the support for the Google's IAP.

mik-laj commented 4 years ago

@rafaelvargas I assigned you to "API supports Google IAP". I am trying to gain permission to publish the integration with Webserver but this may not happen so I do not assign myself to the task. However, I am happy to help with the review for IAP integration for AIP.

mik-laj commented 4 years ago

@zjffdu Jarek suggested that we also provide support for Apache Knox. Can you share details about how this product works?

zjffdu commented 4 years ago

@mik-laj Thanks for at me, apache knox is a reverse proxy, I mean to use knox as reverse proxy of airflow, so that we can leverage knox's sso. https://knox.apache.org/

mik-laj commented 4 years ago

@zjffdu How is the identity from Apache Knox passed to other applications? Have you ever tried integrating other applications with Apache Knox?

loozhengyuan commented 4 years ago

@mik-laj FYI, not sure if this has been raised yet but Keycloak has recently sunsetted the Louketo project and is due to EOL on 21 Nov. Here's the relevant GitHub issue. As such, we may consider omitting Louketo from the scope of the issue.

zjffdu commented 4 years ago

@zjffdu How is the identity from Apache Knox passed to other applications? Have you ever tried integrating other applications with Apache Knox?

It looks like there's already a PR in knox project. https://github.com/apache/knox/pull/182

w4tsn commented 3 years ago

@mik-laj @loozhengyuan Keycloak states that oauth2-proxy is the viable alternative, so maybe this project should replace Louketo in this issue.

mik-laj commented 3 years ago

@w4tsn I have a working and tested implentation that uses Loukietto proxy. if time permits I will try to update it to use a different proxy and contribute it to community, but for now I have big time deficit.

rg2609 commented 3 years ago

@mik-laj can you share the branch or PR what changes you did for keycloak

mik-laj commented 3 years ago

@rg2609 Unfortunately, this is part of a client project, and I haven't found the time to reimplement it in the community.

rg2609 commented 3 years ago

@mik-laj so can guide me where to make changes

ghost commented 3 years ago

@rafaelvargas, was wondering if the Google IAP support is actively being worked on. If not, I'd be interested in giving it a shot!

mik-laj commented 3 years ago

@alex-kattathra-johnson Assigned. I also worked on IAP support, but never finished. I managed to write some system test code to check if the integration is working fine. Feel free to use it in your PR. https://github.com/mik-laj/airflow/pull/35/files

brandondtb commented 2 years ago

@rafaelvargas @ap-kulkarni I wanted to check in on the progress of this one and see if either of you are actively working on it. I could really use this feature, and would be happy to help however I can.

ap-kulkarni commented 2 years ago

Apologies for a long hiatus on this one. Could not work on this due to personal issues. I have started analyzing the requirement to integrate with Pomerium and have few questions.

  1. When request is received within airflow will the user be already authenticated with Pomerium? i.e. When validating request in the auth backend, should the code directly look for the header X-Pomerium-Jwt-Assertion or the request would contain credentials which the code should authenticate with pomerium first?
  2. To validate jwt header we will need a jwt library and I feel jwcrypto will be good since it supports all facets of the jwt as per the JWT.IO page detailing the libraries. When I tried installing the library in the python environment created for airflow, I found that the library is already installed as part of dependency of some other requirement. However, I feel we should add explicit requirement for this. Let me know if this is okay and what criteria is used to pin a requirement to a particular version. Also, if anyone has some other suggestion for jwt library, would like to hear that as well.

At this point I am initially concentrating on API authentication only. Once I am clear enough with the details, I will check out FAB implementation. Again apologies for not able to working on this one for long.

mik-laj commented 2 years ago

When request is received within airflow will the user be already authenticated with Pomerium? i.e. When validating request in the auth backend, should the code directly look for the header X-Pomerium-Jwt-Assertion or the request would contain credentials which the code should authenticate with pomerium first?

I have no experience with this platform, but a non-privileged user should not be able to log in and this is the main requirement.

Let me know if this is okay and what criteria is used to pin a requirement to a particular version.

We should create a new provider and define all requirements explicitly. Here is our doc about dependencies and upper-bound version of Airflow dependencies: https://github.com/apache/airflow#approach-to-dependencies-of-airflow lower bound should point to the version you are currently testing.

ap-kulkarni commented 2 years ago

Thank you @mik-laj. I will try setting minimal environment required for this. Will post questions here if stuck anywhere.

softestplease commented 2 years ago

API supports Google IAP

Hello @mik-laj , has this code/feature been tested (https://github.com/mik-laj/airflow/pull/35/files)? I do need this feature in my environment as we are running Airflow in GKE and would like to trigger Dags with Stable Rest API from Cloud Function. I believe that HTTP only support one authentication header. hence IAP is used from CF to Airflow@GKE, so we are unable to add include username/password for basic_auth backend type. cheers