apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.31k forks source link

waitForMigrations fails due to initContainer order #42158

Open dindurthy opened 2 months ago

dindurthy commented 2 months ago

Apache Airflow version

2.10.0

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

I see related issues but not specifically the one I'm about to describe.

We are using the helm chart v1.15.0 which enables extraInitContainers for the migrateDatabaseJob.

By default, the webserver, workers, triggerer, scheduler setwaitForMigrations.enabled: true in the chart, which kinda makes sense. However, that will not work if the wait-for-airflow-migrations itself depends on an initContainer. In our case, which expect is a semi-common one, database connections for GKE workloads connecting to private CloudSQL instances rely on the cloud-sql-proxy initContainer. Because extraInitContainers injects those containers after the wait-for-airflow-migrations and because the kubelet will run initContainers in their order regardless of a latter one being a sidecar.

What you think should happen instead?

For this to work, we would need extraInitContainers to come first (perhaps a breaking change) or have some general control of the order including the wait container.

How to reproduce

Set up an airflow database that requires the an init container to connect. With waitForMigration.enabled: true, airflow pods will be stuck in Pending.

Operating System

ubuntu

Versions of Apache Airflow Providers

N/A

Deployment

Official Apache Airflow Helm Chart

Deployment details

k8s v1.29.0

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 2 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

dindurthy commented 2 months ago

It could be that in a sync context, e.g. ArgoCD or Flux, waitForMigrations.enabled doesn't matter, in which case a documentation update might be sufficient