airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
630 stars 474 forks source link

Airflow Web Server does not include config map for kubernetes pod template config map #740

Closed akash-jain-10 closed 6 months ago

akash-jain-10 commented 1 year ago

Checks

Chart Version

8.6.1

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:14:41Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/arm64"}

Helm Version

version.BuildInfo{Version:"v3.10.2", GitCommit:"50f003e5ee8704ec937a756c646870227d7c8b58", GitTreeState:"clean", GoVersion:"go1.19.3"}

Description

We are using a custom Airflow plugin that helps us dynamically deploy DAGs via REST APIs. During the deployment process, we are triggering the SchedulerJob to force scanning the new created DAGs from python code. This is a preferred solution, rather than managing the scan interval time. However, when running the deployments in k8s with KubernetesExecutor, the new DAGs are not picking up the right pod_template_file. This is because the pod_template_file is only mounted in the Scheduler pod, and not on the Webserver pod, where the plugins run. If we manually mount the pod_template_file in the right location AIRFLOW__KUBERNETES_EXECUTOR__POD_TEMPLATE_FILE , then everything works as expected. This is the chart adding the pod_template_file in the scheduler, but this entry is missing from the webserver chart. However, the official Ariflow charts are indeed adding the pod_template_file to the webserver as well.

Expected Behaviour - For Plugins that runs on Airflow Web Servers and create DAGs on the fly (programmatically), It should pick up the right pod template file with correct serviceAccount name and dags and logs directory mounted.

Actual Behaviour - New Kubernetes Pod that spins up for DAGs generated via plugin running on Airflow webServer spins up with default service account name and does not have airflow environment variables (it does not use the pod_template.yaml file associated with scheduler pod).

Alternatives/workarounds - Currently, we did manage to get a static work around available by passing extraVolumes and extraVolumeMounts for web server but since the name of ConfigMap is derived from helm release name, this is not a suggested work around to get things going!

It would be great to take a look here and add the ConfigMap baked in with the helm charts.

Relevant Logs

No logs to detect this! Just trial and error.

Custom Helm Values

airflow:
  image:
    repository: docker.getcollate.io/openmetadata/ingestion
    tag: 1.0.0
    pullPolicy: "IfNotPresent"
  executor: "KubernetesExecutor"
  config:
    # This is required for OpenMetadata UI to fetch status of DAGs
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
    # OpenMetadata Airflow Apis Plugin DAGs Configuration
    AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/opt/airflow/dags"
    # OpenMetadata Airflow Secrets Manager Configuration
    AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_REGION: ""
    AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_ACCESS_KEY_ID: ""
    AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_ACCESS_KEY: ""
  users:
  - username: admin
    password: admin
    role: Admin
    email: spiderman@superhero.org
    firstName: Peter
    lastName: Parker
web:
  readinessProbe:
    enabled: true
    initialDelaySeconds: 60
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 10
  livenessProbe:
    enabled: true
    initialDelaySeconds: 60
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 10
postgresql:
  enabled: false
workers:
  enabled: false
flower:
  enabled: false
redis:
  enabled: false
externalDatabase:
  type: mysql
  host: mysql
  port: 3306
  database: airflow_db
  user: airflow_user
  passwordSecret: airflow-mysql-secrets
  passwordSecretKey: airflow-mysql-password
serviceAccount:
  create: true
  name: "airflow"
scheduler:
  logCleanup:
    enabled: false
dags:
  persistence:
    enabled: true
    # NOTE: "" means cluster-default
    storageClass: ""
    size: 1Gi
    accessMode: ReadWriteMany
logs:
  persistence:
    enabled: true
    # empty string means cluster-default
    storageClass: ""
    accessMode: ReadWriteMany
    size: 1Gi
stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had activity in 60 days. It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label
akash-jain-10 commented 11 months ago

Hey Team, want to follow up here! Is this something that can be added as part of enhancements to Airflow-Helm Community Charts ?

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had activity in 60 days. It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label
thesuperzapper commented 6 months ago

@akash-jain-10 As far as I know, the existing airflow.kubernetesPodTemplate.* values work correctly to template the AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE file.

We also provide the airflow.kubernetesPodTemplate.stringOverride value to override the full template with a custom string value, if required.

akash-jain-10 commented 5 months ago

Hello @thesuperzapper - The kubernetesPodTemplate seems to be only mounted with scheduler and not with webserver. Is this intentional ? Any custom Plugin that relies on the kubernetes Pod Template on the webserver pod seems to be failing for us!