apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.41k stars 13.83k forks source link

Allow backend DB to authenticate using temporary tokens #30368

Open albertocalderari opened 1 year ago

albertocalderari commented 1 year ago

Description

Based on this discussion. Currrently there is no way to use token identity to authenticate with amazon RDS without a fairly significant change to the helm charts and airflow code.

I will implement this functionality and add the helm options as:

externalDatabase:
  type: postgres
  host: airflow-cluster.<uniqueId>.us-east-1.rds.amazonaws.com

  ## the port of the external database
  ##
  port: 5432

  ## the database/scheme to use within the external database
  ##
  database: airflow

  ## the username for the external database
  ##
  user: airflow

  awsRdsTokenIdentity:
    enabled: true
    region: us-east-1
    connectionExpirySeconds: 600

And use sqlalchemy envents to provide the token.

def amend_connection(cparams):
    if conf.getboolean("database", "use_aws_token_identity"):
        log.info(f'connecting user {cparams["user"]} to {cparams["host"]}:{cparams["host"]} using pod identity')
        client = boto3.client(
            "rds",
            region_name=conf.get_mandatory_value("database", "aws_region"),
        )
        token = client.generate_db_auth_token(
            DBHostname=cparams["host"],
            Port=cparams["port"],
            DBUsername=cparams["user"],
        )
        cparams["password"] = token
    else:
        log.info(f'connecting  {cparams["user"]} using user/password')

@event.listens_for(engine, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
    amend_connection(cparams)

Use case/motivation

Temporary credentials are a security feature generally required secops and a general good practice these days, so it makes sense for me to support them.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

hussein-awala commented 1 year ago

Allowing backend DB to authenticate using temporary tokens (or other kind of tokens) is a good new feature, but IMHO implementing a new method in Airflow core only for AWS RDS is not a good idea.

Instead, we can add a new conf to tell Airflow which method it should use for tokens (I prefer a generic method for all connection conf), something like:

@event.listens_for(engine, "do_connect")
def provide_token(dialect, conn_rec, cargs, cparams):
    sql_alchemy_amend_connection_method_path = conf.get("database", "sql_alchemy_amend_connection_method")
    if sql_alchemy_amend_connection_method_path:
        sql_alchemy_amend_connection_method = import_string(sql_alchemy_amend_connection_method_path)
        sql_alchemy_amend_connection_method(cparams)

Then the user can implement his own method and provide its path to the Airflow conf database.sql_alchemy_amend_connection_method, we can change the method and the conf name if you have a better suggestion. And maybe later in a separate PR, we add the amend method used for AWS RDS tokens to the AWS provider. WDYT about this suggestion?

@potiuk I don't think he needs an AIP in this case, WDYT?

potiuk commented 1 year ago

I think it might ask for a discussion on the devlist - to drag attention of those who might know more about sqlalchemy event system, but yeah, I think it's too small of a thing (and not really impacting the architecture for an AIP).

albertocalderari commented 1 year ago

I wanted to PR and set it int the discussion, I was considering exposing this, but didn't relize we could expose it through config.

albertocalderari commented 1 year ago

Soluton 1: https://github.com/apache/airflow/pull/30438 Solution 2: https://github.com/apache/airflow/pull/30439 @hussein-awala

Will start a discussion too

potiuk commented 1 year ago

Yeah - I just saw those comments after looing at #30438 and #30439 and we came to similar conclusions as @hussein-awala - the "generic" method is cool, and I think it would be valuable to have it (and the RDS implemenation in the Amazon provider - and this should be described in the docs.

This is a bit low-level, but IMHO it falls into our "Airflow-as-a-platform" approach - where you can extend generic Airflow Platform with some extensions. This should likely find its way into: https://airflow.apache.org/docs/apache-airflow-providers/core-extensions/index.html