airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
665 stars 476 forks source link

The airflow.connections parameter is not working as expected for Databricks connection #736

Closed ViniciusNoggo closed 1 year ago

ViniciusNoggo commented 1 year ago

Checks

Chart Version

8.6.1

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:40:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.10", GitCommit:"5c1d2d4295f9b4eb12bfbf6429fdf989f2ca8a02", GitTreeState:"clean", BuildDate:"2023-01-27T22:54:20Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.11.2", GitCommit:"912ebc1cd10d38d340f048efaf0abda047c3468e", GitTreeState:"clean", GoVersion:"go1.18.10"}

Description

I'm trying to configure the connections using theairflow.connections parameter like the documentation points out.

Specifically for the Databricks connection I'm having a problem to authenticate when trying to use the token in the password field.

I created a secret to store the connection token and used it in the airflow.connectionsTemplates to propagate to the airflow.connections.

Once I have the token in the password field and I test the connection in the UI the result that I receive is a 401 error (HTTP ERROR 401).

4OLZc

The thing is, if I change the values.yaml file and start using the extra field instead the password or if I copy the exactly same token and paste it manually in the password field using directly the Airflow UI, in both cases the test succeeded.

The extra field mentioned above would be configured like this:

      extra: |
        {
          "token": "${JOB_DATABRICKS_TOKEN}"
        }

62Eon

For security reasons and to avoid depending on manual post-install steps to configure the connection I want to use the parameters and not the UI.

Any help towards that?

Relevant Logs

HTTP ERROR 401
Problem accessing /api/2.0/clusters/spark-versions. 
Reason: Unauthorized

Custom Helm Values

airflow:
  connections:
    - id: conn_databricks_getd
      type: databricks
      host: ${DATABRICKS_HOST}
      password: ${DATABRICKS_TOKEN}
  connectionsTemplates:
    DATABRICKS_HOST:
      kind: secret
      name: secret-credentials-databricks
      key: DATABRICKS_HOST  
    DATABRICKS_TOKEN:
      kind: secret
      name: secret-credentials-databricks
      key: DATABRICKS_TOKEN
thesuperzapper commented 1 year ago

@ViniciusNoggo I can see from the Databricks provider docs, that the behavior of the password field will depend on the value of the login field.

Specifically if login is empty then token will be sent in request header as Bearer token, if login is ‘token’ then it will be sent using Basic Auth which is allowed by Databricks API, this may be useful if you plan to reuse this connection with e.g. SimpleHttpOperator, so I wonder if it will work when you specify "token" as the login.

Also, can you confirm that the issue ALSO happens when you manually set the password field (in the UI) with your token? Because if it works in that situation, then it must be something related to the chart's connection sync controller.

ViniciusNoggo commented 1 year ago

Hi @thesuperzapper, the issue only happens when I pass the token using the password parameter in the values.yml file, if I take the same token and set it in the password field using the UI, then it works just fine.

Screenshot from 2023-05-02 00-07-33

By the way, I also tried the suggestion using the login field, in this case I tried both, changing in the values.yaml file and passing it manually in the UI, both didn't work. So I think that the password field is the correct field but it only works if I set it directly in the UI.

So it can be related to the chart's connection sync controller, right?

thesuperzapper commented 1 year ago

@ViniciusNoggo I am not sure, but just to be clear, are you 100% sure that your airflow.connectionTemplates for JOB_DATABRICKS_TOKEN is correct?

For example:

If none of the above applies, are you seeing logs in the sync-connections Pod that indicate that it is successfully updating the connection (e.g. if you manually update it in the UI, it should revert any change you made)?

ViniciusNoggo commented 1 year ago

@thesuperzapper Yes I am sure.

Answering your examples:

Unfortunately I can't see the sync-connections Pod logs right now, but I will try it and send the logs here afterwards.

So if I deploy the Airflow using the airflow.connection and airflow.connectionTemplates parameters in the values.yaml and then update the connection in the UI and save it, the changes made in the UI should overwrite the values or it should be discarded going back to the previous config defined by the parameters?

thesuperzapper commented 1 year ago

@ViniciusNoggo what should happen is that the sync-connections Pods should revert any changes made in the UI after about 60 seconds (unless they happen to be the same as what you defined in airflow.connections and airflow.connectionTemplates).

If that does not happen, it might not be updating the connection that you think it is, in that case, check that the id of the airflow.connections is EXACTLY the same as the one you are working on in the UI.

ViniciusNoggo commented 1 year ago

Hi @thesuperzapper, we tried to update the connection using the UI and the sync-connections Pods worked as expected, the changes were reverted to what was defined in the airflow.connections and airflow.connectionTemplates parameters.

I double checked the id and those are the same.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in 60 days. It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label