apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.07k stars 14.29k forks source link

GKEStartPodOperator cannot connect to Private IP after upgrade to 2.6.x #31387

Closed sgomezf closed 1 year ago

sgomezf commented 1 year ago

Apache Airflow version

2.6.1

What happened

After upgrading to 2.6.1, GKEStartPodOperator stopped creating pods. According with release notes we created a specific gcp connection. But connection defaults to GKE Public endpoint (in error message masked as XX.XX.XX.XX) instead of private IP which is best since our cluster do not have public internet access.

[2023-05-17T07:02:33.834+0000] {connectionpool.py:812} WARNING - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f0e47049ba0>, 'Connection to XX.XX.XX.XX timed out. (connect timeout=None)')': /api/v1/namespaces/airflow/pods?labelSelector=dag_id%3Dmytask%2Ckubernetes_pod_operator%3DTrue%2Crun_id%3Dscheduled__2023-05-16T0700000000-8fb0e9fa9%2Ctask_id%3Dmytask%2Calready_checked%21%3DTrue%2C%21airflow-sa

Seems like with this change "use_private_ip" has been deprecated, what would be the workaround in this case then to connect using private endpoint?

Also doc has not been updated to reflect this change in behaviour: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/kubernetes_engine.html#using-with-private-cluster

What you think should happen instead

There should still be an option to connect using previous method with option "--private-ip" so API calls to Kubernetes call the private endpoint of GKE Cluster.

How to reproduce

  1. Create DAG file with GKEStartPodOperator.
  2. Deploy said DAG in an environment with no access tu public internet.

Operating System

cos_coaintainerd

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==5.2.2 apache-airflow-providers-google==8.11.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

sgomezf commented 1 year ago

PR with change in functionality: https://github.com/apache/airflow/pull/29266 This upgrade was from 2.5.1 to 2.6.1

hussein-awala commented 1 year ago

Indeed, there is a problem in the provider documentation which that should have been updated in the version 9.0.0.

For the main issue, I think there is no way in the current implementation to force the operator/hook to use the private endpoint, where the cluster information are fetched via the GCP python client using the project id and the cluster name.

I'll open a PR to fix this.

hussein-awala commented 1 year ago

@sgomezf could you test if #31391 resolves your problem? you can install the provider from the PR branch or simply copy the operator code with changes I made in a new module, then use the copied operator instead of the provider one.

sgomezf commented 1 year ago

I'm happy to confirm that now I can create pods with GKEStartPodOperator again. Thanks a lot for the quick response @hussein-awala I did some runs and everything seems normal. Documentation can be confusing though, should I open a separate issue for it?

potiuk commented 1 year ago

Documentation can be confusing though, should I open a separate issue for it?

Best is to just fix the docs. It's super easy. Just click "suggest a change on this page" at the bottom-right and you will get a PR opened, and you will be able to update the documentation there without leaving the GitHub UI and submit as a PR (and you will become Airflow Contributor that way as a free bonus) :)