airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
660 stars 475 forks source link

Helm Chart 8.5.2 doesn't work with airflow 2.2.0 - dumb-init not found. #458

Closed jholowaty closed 2 years ago

jholowaty commented 3 years ago

What version of the chart are you using?:

I am using version 8.5.2

What is your Kubernetes Version?:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.10-gke.301", GitCommit:"17ad7bd6afa01033d7bd3f02ce5de56f940a915d", GitTreeState:"clean", BuildDate:"2021-08-24T05:18:54Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.22) and server (1.20) exceeds the supported minor version skew of +/-1

What is your Helm version?:

$ helm version
version.BuildInfo{Version:"v3.7.0", GitCommit:"eeac83883cb4014fe60267ec6373570374ce770b", GitTreeState:"clean", GoVersion:"go1.17"}

Please copy your custom Helm values file:

click to expand ```yaml ## enable this value if you pass `--wait` to your `helm install` ## helmWait: false ################################### # Airflow - Common Configs ################################### airflow: ## if we use legacy 1.10 airflow commands ## legacyCommands: false ## configs for the airflow container image ## image: repository: us.gcr.io/x/airflow/custom/airflow-2.2.0-3.7 tag: latest ## values: Always or IfNotPresent pullPolicy: IfNotPresent pullSecret: "" uid: 999 gid: 999 ## the airflow executor type to use executor: CeleryExecutor ## the fernet key used to encrypt the connections/variables in the database fernetKey: "x" ## environment variables for airflow configs config: # KUBERNETES CONFIG AIRFLOW__CORE__LOAD_EXAMPLES: "False" AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "us.gcr.io/x/airflow/custom/airflow-2.2.0-3.7" AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "latest" AIRFLOW__KUBERNETES__RUN_AS_USER: "999" AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "True" AIRFLOW__KUBERNETES__GIT_REPO: "x" AIRFLOW__KUBERNETES__GIT_BRANCH: "x" # AIRFLOW__KUBERNETES__GIT_BRANCH: "celery-airflow-2-0" AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: "/opt/airflow/dags" AIRFLOW__KUBERNETES__NAMESPACE: "peya-eng-celery-airflow-2-0" AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "True" AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE: "False" AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "60" AIRFLOW__WEBSERVER__RBAC: "True" #AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE: '' AIRFLOW__KUBERNETES__WORKER_DAGS_FOLDER: "/opt/airflow/dags" AIRFLOW__KUBERNETES__DAGS_IN_IMAGE: "False" AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow-2-0 # ### CUSTOM PLUGINS AND CONFIG AIRFLOW__CORE__PLUGINS_FOLDER: "/opt/airflow/dags/dags/framework/plugins/" # SECRET MANAGER CONFIG # GOOGLE AUTH AIRFLOW__WEBSERVER__AUTHENTICATE: "True" # API AUTH CONFIGS AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.basic_auth" # #Remote Logging AIRFLOW__LOGGING__REMOTE_LOGGING: "True" AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "gs://x/logs" AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "AIRFLOW_LOGS_GSP" AIRFLOW__CORE__REMOTE_LOGGING: "True" AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "gs://x/logs" AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "AIRFLOW_LOGS_GSP" remote_log_conn_id: "AIRFLOW_LOGS_GSP" remote_logging: "True" remote_base_log_folder: "gs://x/logs" # CELERY CONFIGS # AIRFLOW__CELERY__WORKER_AUTOSCALE: "64,1" AIRFLOW__CELERY__WORKER_CONCURRENCY: "32" AIRFLOW__CORE__PARALLELISM: "64" AIRFLOW__CORE__DAG_CONCURRENCY: "32" AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: "32" # STATS CONFIGS AIRFLOW__METRICS__STATSD_ON: "True" AIRFLOW__METRICS__STATSD_HOST: "peya-eng-celery-airflow-2-0-statsd" AIRFLOW__METRICS__STATSD_PORT: "9125" AIRFLOW__METRICS__STATSD_PREFIX: "airflow-origins-pro" AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT: "1800" ## a list of initial users to create AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL: "60" AIRFLOW__CORE__MIN_SERIALIZED_DAG_FETCH_INTERVAL: "30" AIRFLOW__CORE__STORE_DAG_CODE: "True" AIRFLOW__CORE__CHECK_SLAS: "True" AIRFLOW__SCHEDULER__PARSING_PROCESSES: "16" # MYSQL AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: '3000' users: - username: api-user-auth password: x role: Admin email: admin@example.com firstName: api lastName: auth ## if we update users or just create them the first time (lookup by `username`) usersUpdate: true ## a list of initial connections to create connections: - id: AIRFLOW_LOGS_GSP type: google_cloud_platform description: my GCP connection extra: |- { "extra__google_cloud_platform__project": "xxx", "extra__google_cloud_platform__keyfile_dict: "gcloud secrets versions access latest --secret=airflow-connections-AIRFLOW_LOGS_GSP", "extra__google_cloud_platform__scope": "https://www.googleapis.com/auth/cloud-platform" } ## if we update connections or just create them the first time (lookup by `id`) connectionsUpdate: true ## a list of initial variables to create variables: [] ## if we update variables or just create them the first time (lookup by `key`) variablesUpdate: true ## a list of initial pools to create pools: [] ## if we update pools or just create them the first time (lookup by `name`) poolsUpdate: true ## extra annotations for the web/scheduler/worker/flower Pods podAnnotations: {} ## extra pip packages to install in the web/scheduler/worker/flower Pods extraPipPackages: - "google-auth" - "Flask" - "Authlib" - "Flask-OAuthlib" - "werkzeug>1.0.0" - "apache-airflow-providers-amazon" - "apache-airflow-providers-google" - "tableauserverclient==0.15" - "apache-airflow[statsd]" - "apache-airflow-providers-apache-beam" ## extra environment variables for the web/scheduler/worker/flower Pods extraEnv: - name: PYTHONPATH value: "/opt/airflow/dags/dags:/opt/airflow/dags/dags/framework/:/opt/airflow/dags/dags/framework/config/:/opt/airflow/dags/dags/framework/plugins/:/usr/local/lib/python38.zip:/usr/local/lib/python3.8:/usr/local/lib/python3.8/lib-dynload:/home/airflow/.local/lib/python3.8/site-packages:/usr/local/lib/python3.8/site-packages" - name: explicit_defaults_for_timestamp value: "on" ## extra containers for the web/scheduler/worker/flower Pods extraContainers: - name: dags-gcsfuse args: - /opt/airflow image: 'us.gcr.io/xxx/gcsfuse/airflow-gcsfuse:latest' imagePullPolicy: IfNotPresent # command: ["sh", "-c", "mkdir /opt/airflow/dags/__pycache__"] securityContext: runAsGroup: 999 runAsUser: 999 runAsNonRoot: true resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File env: - name: GCS_BUCKET value: "" - name: PYTHONPATH value: "/opt/airflow/dags/:/opt/airflow/dags/dags/framework/:/opt/airflow/dags/dags/framework/config/:/opt/airflow/dags/dags/framework/plugins/:/usr/local/lib/python38.zip:/usr/local/lib/python3.8:/usr/local/lib/python3.8/lib-dynload:/home/airflow/.local/lib/python3.8/site-packages:/usr/local/lib/python3.8/site-packages" volumeMounts: - name: dags-gcsfuse mountPath: /opt/airflow - name: dags mountPath: /opt/airflow/dags ## extra VolumeMounts for the web/scheduler/worker/flower Pods extraVolumeMounts: - name: dags-gcsfuse mountPath: /opt/airflow - name: dags mountPath: /opt/airflow/dags readOnly: false ## extra Volumes for the web/scheduler/worker/flower Pods extraVolumes: - name: dags-gcsfuse emptyDir: {} - name: dags emptyDir: {} ## configs to generate the AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE kubernetesPodTemplate: ## the full text value to mount as the "pod_template.yaml" file stringOverride: "" ## the nodeSelector configs for the Pod template nodeSelector: {} ## the affinity configs for the Pod template affinity: {} tolerations: [] ## annotations for the Pod template podAnnotations: {} ## the security context for the Pod template securityContext: {} ## extra pip packages to install in the Pod template extraPipPackages: [] ## extra VolumeMounts for the Pod template extraVolumeMounts: [] ## extra Volumes for the Pod template extraVolumes: [] ## resources requirements for the Pod template default "base" container resources: {} ################################### # Airflow - Scheduler Configs ################################### scheduler: ## the number of scheduler Pods to run replicas: 2 ## resource requests/limits for the scheduler Pod resources: requests: memory: "1Gi" ## the nodeSelector configs for the scheduler Pods nodeSelector: {} ## the affinity configs for the scheduler Pods affinity: {} ## the toleration configs for the scheduler Pods tolerations: [] ## the security context for the scheduler Pods securityContext: {} ## labels for the scheduler Deployment labels: {} ## Pod labels for the scheduler Deployment podLabels: {} ## annotations for the scheduler Deployment annotations: {} ## Pod annotations for the scheduler Deployment podAnnotations: {} ## if we add the annotation: "cluster-autoscaler.kubernetes.io/safe-to-evict" = "true" safeToEvict: true ## configs for the PodDisruptionBudget of the scheduler podDisruptionBudget: ## if a PodDisruptionBudget resource is created for the scheduler enabled: true ## the maximum unavailable pods/percentage for the scheduler maxUnavailable: "20%" ## the minimum available pods/percentage for the scheduler minAvailable: "" ## sets `airflow --num_runs` parameter used to run the airflow scheduler numRuns: -1 ## configs for the scheduler Pods' liveness probe livenessProbe: enabled: true initialDelaySeconds: 10 periodSeconds: 30 timeoutSeconds: 10 failureThreshold: 5 ## extra pip packages to install in the scheduler Pods extraPipPackages: [] ## extra VolumeMounts for the scheduler Pods extraVolumeMounts: [] ## extra Volumes for the scheduler Pods extraVolumes: [] ## extra init containers to run in the scheduler Pods extraInitContainers: [] ################################### # Airflow - WebUI Configs ################################### web: ## configs to generate webserver_config.py webserverConfig: ## the full text value to mount as the webserver_config.py file stringOverride: "" ## the name of a pre-created secret containing a `webserver_config.py` file as a key existingSecret: "airflow-webserver-secret" ## the number of web Pods to run replicas: 1 ## resource requests/limits for the web Pod resources: requests: memory: "2Gi" ## the nodeSelector configs for the web Pods nodeSelector: {} ## the affinity configs for the web Pods affinity: {} ## the toleration configs for the web Pods tolerations: [] ## the security context for the web Pods securityContext: {} ## labels for the web Deployment labels: { "team": "dataops", "type": "airflow20" } ## Pod labels for the web Deployment podLabels: {} ## annotations for the web Deployment annotations: { prometheus.io/scrape: 'true', prometheus.io/path: '/stats' } ## Pod annotations for the web Deployment podAnnotations: {} ## if we add the annotation: "cluster-autoscaler.kubernetes.io/safe-to-evict" = "true" safeToEvict: true ## configs for the PodDisruptionBudget of the web Deployment podDisruptionBudget: ## if a PodDisruptionBudget resource is created for the web Deployment enabled: false ## the maximum unavailable pods/percentage for the web Deployment maxUnavailable: "" ## the minimum available pods/percentage for the web Deployment minAvailable: "" ## configs for the Service of the web Pods service: annotations: {} sessionAffinity: "None" sessionAffinityConfig: {} type: NodePort externalPort: 8080 loadBalancerIP: "" loadBalancerSourceRanges: [] nodePort: http: "" ## configs for the web Pods' readiness probe readinessProbe: enabled: true initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 ## configs for the web Pods' liveness probe livenessProbe: enabled: false initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 ## extra pip packages to install in the web Pods extraPipPackages: [] ## extra VolumeMounts for the web Pods extraVolumeMounts: [] ## extra Volumes for the web Pods extraVolumes: [] ################################### # Airflow - Celery Worker Configs ################################### workers: ## if the airflow workers StatefulSet should be deployed enabled: true ## the number of worker Pods to run replicas: 2 ## resource requests/limits for the worker Pod resources: requests: memory: "1Gi" ## the nodeSelector configs for the worker Pods nodeSelector: {} ## the affinity configs for the worker Pods affinity: {} ## the toleration configs for the worker Pods tolerations: [] ## the security context for the worker Pods securityContext: {} ## labels for the worker StatefulSet labels: { "team": "dataops", "type": "airflow20" } ## Pod labels for the worker StatefulSet podLabels: {} ## annotations for the worker StatefulSet annotations: {} ## Pod annotations for the worker StatefulSet podAnnotations: {} ## if we add the annotation: "cluster-autoscaler.kubernetes.io/safe-to-evict" = "true" safeToEvict: true ## configs for the PodDisruptionBudget of the worker StatefulSet podDisruptionBudget: ## if a PodDisruptionBudget resource is created for the worker StatefulSet enabled: true ## the maximum unavailable pods/percentage for the worker StatefulSet maxUnavailable: "20%" ## the minimum available pods/percentage for the worker StatefulSet minAvailable: "" ## configs for the HorizontalPodAutoscaler of the worker Pods autoscaling: enabled: true maxReplicas: 16 metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50 ## configs for the celery worker Pods celery: ## if celery worker Pods are gracefully terminated ## ## graceful termination process: ## 1. prevent worker accepting new tasks ## 2. wait AT MOST `workers.celery.gracefullTerminationPeriod` for tasks to finish ## 3. send SIGTERM to worker ## 4. wait AT MOST `workers.terminationPeriod` for kill to finish ## 5. send SIGKILL to worker ## NOTE: ## - consider defining a `workers.podDisruptionBudget` to prevent there not being ## enough available workers during graceful termination waiting periods gracefullTermination: true ## how many seconds to wait for tasks to finish before SIGTERM of the celery worker gracefullTerminationPeriod: 540 ## how many seconds to wait after SIGTERM before SIGKILL of the celery worker ## ## WARNING: ## - tasks that are still running during SIGKILL will be orphaned, this is important ## to understand with KubernetesPodOperator(), as Pods may continue running terminationPeriod: 60 ## extra pip packages to install in the worker Pod extraPipPackages: [] ## extra VolumeMounts for the worker Pods extraVolumeMounts: [] ## extra Volumes for the worker Pods extraVolumes: [] ################################### # Airflow - Flower Configs ################################### flower: ## if the airflow flower UI should be deployed enabled: true ## the number of flower Pods to run replicas: 1 ## resource requests/limits for the flower Pod resources: {} ## the nodeSelector configs for the flower Pods nodeSelector: {} ## the affinity configs for the flower Pods affinity: {} ## the toleration configs for the flower Pods tolerations: [] ## the security context for the flower Pods securityContext: {} ## labels for the flower Deployment labels: { "team": "dataops", "type": "airflow20" } ## Pod labels for the flower Deployment podLabels: {} ## annotations for the flower Deployment annotations: {} ## Pod annotations for the flower Deployment podAnnotations: {} ## if we add the annotation: "cluster-autoscaler.kubernetes.io/safe-to-evict" = "true" safeToEvict: true ## configs for the PodDisruptionBudget of the flower Deployment podDisruptionBudget: ## if a PodDisruptionBudget resource is created for the flower Deployment enabled: false ## the maximum unavailable pods/percentage for the flower Deployment maxUnavailable: "" ## the minimum available pods/percentage for the flower Deployment minAvailable: "" ## the value of the flower `--auth` argument oauthDomains: "" ## the name of a pre-created secret containing the basic authentication value for flower basicAuthSecret: "" ## the key within `flower.basicAuthSecret` containing the basic authentication string basicAuthSecretKey: "" ## configs for the Service of the flower Pods service: annotations: {} type: ClusterIP externalPort: 5555 loadBalancerIP: "" loadBalancerSourceRanges: [] nodePort: http: ## configs for the flower Pods' readinessProbe probe readinessProbe: enabled: true initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 ## configs for the flower Pods' liveness probe livenessProbe: enabled: true initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 6 ## extra pip packages to install in the flower Pod extraPipPackages: [] ## extra VolumeMounts for the flower Pods extraVolumeMounts: [] ## extra Volumes for the flower Pods extraVolumes: [] ################################### # Airflow - Logs Configs ################################### logs: ## the airflow logs folder path: /opt/airflow/logs ## configs for the logs PVC persistence: ## if a persistent volume is mounted at `logs.path` enabled: false ## the name of an existing PVC to use existingClaim: "" ## sub-path under `logs.persistence.existingClaim` to use subPath: "" ## the name of the StorageClass used by the PVC storageClass: "" ## the access mode of the PVC accessMode: ReadWriteMany ## the size of PVC to request size: 1Gi ################################### # Airflow - DAGs Configs ################################### dags: ## the airflow dags folder path: /opt/airflow/dags ## configs for the dags PVC persistence: ## if a persistent volume is mounted at `dags.path` enabled: false ## the name of an existing PVC to use existingClaim: "" ## sub-path under `dags.persistence.existingClaim` to use subPath: "" ## the name of the StorageClass used by the PVC storageClass: "standard" ## the access mode of the PVC accessMode: ReadOnlyMany ## the size of PVC to request size: 1Gi ## configs for the git-sync sidecar (https://github.com/kubernetes/git-sync) gitSync: ## if the git-sync sidecar container is enabled enabled: false ## the git-sync container image image: repository: k8s.gcr.io/git-sync/git-sync tag: v3.2.2 ## values: Always or IfNotPresent pullPolicy: IfNotPresent uid: 65533 gid: 65533 ## resource requests/limits for the git-sync container resources: requests: ## IMPORTANT! for autoscaling to work with gitSync memory: "64Mi" ## the url of the git repo repo: ## the sub-path (within your repo) where dags are located repoSubPath: "" ## the git branch to check out branch: "poc-dags-engineering" ## the git revision (tag or hash) to check out revision: HEAD ## shallow clone with a history truncated to the specified number of commits depth: 1 ## the number of seconds between syncs syncWait: 60 ## the max number of seconds allowed for a complete sync syncTimeout: 120 ## the name of a pre-created Secret with git http credentials httpSecret: "" ## the key in `dags.gitSync.httpSecret` with your git username httpSecretUsernameKey: username ## the key in `dags.gitSync.httpSecret` with your git password/token httpSecretPasswordKey: password ## the name of a pre-created Secret with git ssh credentials sshSecret: "" ## the key in `dags.gitSync.sshSecret` with your ssh-key file sshSecretKey: id_rsa ## the string value of a "known_hosts" file (for SSH only) sshKnownHosts: "" ################################### # Kubernetes - Ingress Configs ################################### ingress: ## if we should deploy Ingress resources enabled: true ## configs for the Ingress of the web Service web: ## annotations for the web Ingress annotations: {} ## additional labels for the web Ingress labels: { "team": "dataops" } ## the path for the web Ingress path: "" ## the hostname for the web Ingress ## host: # host: "airflow-pocs.peyadata.io" ## configs for web Ingress TLS tls: ## enable TLS termination for the web Ingress enabled: false ## the name of a pre-created Secret containing a TLS private key and certificate secretName: "" ## http paths to add to the web Ingress before the default path precedingPaths: [] ## http paths to add to the web Ingress after the default path succeedingPaths: [] ## configs for the Ingress of the flower Service flower: ## annotations for the flower Ingress annotations: {} ## additional labels for the flower Ingress labels: { "team": "dataops" } ## the path for the flower Ingress path: "" ## the hostname for the flower Ingress host: "" ## configs for flower Ingress TLS tls: ## enable TLS termination for the flower Ingress enabled: false ## the name of a pre-created Secret containing a TLS private key and certificate secretName: "" ## http paths to add to the flower Ingress before the default path precedingPaths: [] ## http paths to add to the flower Ingress after the default path succeedingPaths: [] ################################### # Kubernetes - RBAC ################################### rbac: ## if Kubernetes RBAC resources are created create: true ## if the created RBAC Role has GET/LIST on Event resources events: true ################################### # Kubernetes - Service Account ################################### serviceAccount: ## if a Kubernetes ServiceAccount is created create: true ## the name of the ServiceAccount name: "" ## annotations for the ServiceAccount annotations: iam.gke.io/gcp-service-account: # iam.gke.io/gcp-service-account: compute-engine-default-service@peya-data-ops-pro.iam.gserviceaccount.com ################################### # Kubernetes - Extra Manifests ################################### ## extra Kubernetes manifests to include alongside this chart extraManifests: - apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 labels: component: statsd heritage: Helm release: peya-eng-celery-airflow-2-0 tier: airflow name: peya-eng-celery-airflow-2-0-statsd namespace: peya-eng-celery-airflow-2-0 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: component: statsd release: peya-eng-celery-airflow-2-0 tier: airflow strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app.kubernetes.io/managed-by: "Helm" component: statsd release: peya-eng-celery-airflow-2-0 tier: airflow spec: affinity: {} containers: - name: statsd args: - --statsd.mapping-config=/etc/statsd-exporter/mappings_dataops.xml image: apache/airflow:airflow-statsd-exporter-2021.04.28-v0.17.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /metrics port: 9102 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 ports: - containerPort: 9125 name: statsd-ingest protocol: UDP - containerPort: 9102 name: statsd-scrape protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /metrics port: 9102 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 volumeMounts: - name: peya-eng-celery-airflow-2-0-statsd-configmap mountPath: /etc/statsd-exporter/ resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: runAsUser: 65534 serviceAccount: peya-eng-celery-airflow-2-0 serviceAccountName: peya-eng-celery-airflow-2-0 terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 493 name: peya-eng-celery-airflow-2-0-statsd-configmap name: peya-eng-celery-airflow-2-0-statsd-configmap - apiVersion: v1 kind: Service metadata: annotations: cloud.google.com/neg: '{"ingress":true}' meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 prometheus.io/port: "9102" prometheus.io/scrape: "true" labels: app.kubernetes.io/managed-by: "Helm" chart: airflow-1.1.0 component: statsd heritage: Helm release: peya-eng-celery-airflow-2-0 tier: airflow name: peya-eng-celery-airflow-2-0-statsd namespace: peya-eng-celery-airflow-2-0 spec: ports: - name: statsd-ingest port: 9125 protocol: UDP targetPort: 9125 - name: statsd-scrape port: 9102 protocol: TCP targetPort: 9102 selector: component: statsd release: peya-eng-celery-airflow-2-0 tier: airflow sessionAffinity: None type: ClusterIP - apiVersion: extensions/v1beta1 kind: Ingress metadata: labels: meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 app.kubernetes.io/managed-by: "Helm" chart: airflow-1.1.0 component: statsd heritage: Helm release: peya-eng-celery-airflow-2-0 tier: airflow meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 name: peya-eng-celery-airflow-2-0-statsd-ingress namespace: peya-eng-celery-airflow-2-0 spec: rules: - http: paths: - backend: serviceName: peya-eng-celery-airflow-2-0-statsd servicePort: statsd-scrape - apiVersion: v1 kind: ConfigMap metadata: annotations: deployment.kubernetes.io/revision: "1" meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 labels: meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 app.kubernetes.io/managed-by: "Helm" chart: airflow-1.1.0 component: statsd heritage: Helm release: peya-eng-celery-airflow-2-0 tier: airflow meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 name: airflow-webserver-secret namespace: peya-eng-celery-airflow-2-0 data: webserver_config.py: | import os from airflow import configuration as conf from flask_appbuilder.security.manager import AUTH_DB from flask_appbuilder.security.manager import AUTH_OAUTH basedir = os.path.abspath(os.path.dirname(__file__)) SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') WTF_CSRF_ENABLED = True AUTH_TYPE = AUTH_OAUTH AUTH_ROLE_ADMIN = 'Admin' AUTH_USER_REGISTRATION = True AUTH_USER_REGISTRATION_ROLE = "Admin" GOOGLE_KEY = os.getenv('AIRFLOW_GOOGLE_CLIENT_ID', 'OLD_AIRFLOW_GOOGLE_CLIENT_ID') GOOGLE_SECRET_KEY = os.getenv('AIRFLOW_GOOGLE_CLIENT_SECRET', 'xxxx') PERMANENT_SESSION_LIFETIME = 1800 OAUTH_PROVIDERS = [{ 'name':'google', 'whitelist': ['@company.com'], 'token_key':'access_token', 'icon':'fa-google', 'remote_app': { 'api_base_url':'https://www.googleapis.com/oauth2/v2/', 'client_kwargs':{ 'scope': 'email profile' }, 'access_token_url':'https://accounts.google.com/o/oauth2/token', 'authorize_url':'https://accounts.google.com/o/oauth2/auth', 'request_token_url': None, 'client_id': GOOGLE_KEY, 'client_secret': GOOGLE_SECRET_KEY, } }] APP_THEME = "cerulean.css" # - apiVersion: v1 # kind: Secret # metadata: # name: airflow-postgresql # namespace: peya-eng-celery-airflow-2-0 # data: # postgresql-password: dwao97ad923*903avng95uGBC2 # type: Opaque - apiVersion: v1 kind: ConfigMap metadata: annotations: deployment.kubernetes.io/revision: "1" meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 labels: meta.helm.sh/release-name: peya-eng-celery-airflow-2-0 app.kubernetes.io/managed-by: "Helm" chart: airflow-1.1.0 component: statsd heritage: Helm release: peya-eng-celery-airflow-2-0 tier: airflow meta.helm.sh/release-namespace: peya-eng-celery-airflow-2-0 name: peya-eng-celery-airflow-2-0-statsd-configmap namespace: peya-eng-celery-airflow-2-0 data: mappings_dataops.xml: |- mappings: - match: "(.+)\\.(.+)_start$" match_metric_type: counter name: "airflow_job_start" match_type: regex labels: airflow_id: "$1" job_name: "$2" - match: "(.+)\\.(.+)_end$" match_metric_type: counter name: "airflow_job_end" match_type: regex labels: airflow_id: "$1" job_name: "$2" - match: "(.+)\\.operator_failures_(.+)$" match_metric_type: counter name: "airflow_operator_failures" match_type: regex labels: airflow_id: "$1" operator_name: "$2" - match: "(.+)\\.operator_successes_(.+)$" match_metric_type: counter name: "airflow_operator_successes" match_type: regex labels: airflow_id: "$1" operator_name: "$2" - match: "*.ti_failures" match_metric_type: counter name: "airflow_ti_failures" labels: airflow_id: "$1" - match: "*.ti_successes" match_metric_type: counter name: "airflow_ti_successes" labels: airflow_id: "$1" - match: "*.zombies_killed" match_metric_type: counter name: "airflow_zombies_killed" labels: airflow_id: "$1" - match: "*.scheduler_heartbeat" match_metric_type: counter name: "airflow_scheduler_heartbeat" labels: airflow_id: "$1" - match: "*.dag_processing.processes" match_metric_type: counter name: "airflow_dag_processing_processes" labels: airflow_id: "$1" - match: "*.scheduler.tasks.killed_externally" match_metric_type: counter name: "airflow_scheduler_tasks_killed_externally" labels: airflow_id: "$1" - match: "*.scheduler.tasks.running" match_metric_type: counter name: "airflow_scheduler_tasks_running" labels: airflow_id: "$1" - match: "*.scheduler.tasks.starving" match_metric_type: counter name: "airflow_scheduler_tasks_starving" labels: airflow_id: "$1" - match: "*.scheduler.orphaned_tasks.cleared" match_metric_type: counter name: "airflow_scheduler_orphaned_tasks_cleared" labels: airflow_id: "$1" - match: "*.scheduler.orphaned_tasks.adopted" match_metric_type: counter name: "airflow_scheduler_orphaned_tasks_adopted" labels: airflow_id: "$1" - match: "*.scheduler.critical_section_busy" match_metric_type: counter name: "airflow_scheduler_critical_section_busy" labels: airflow_id: "$1" - match: "*.sla_email_notification_failure" match_metric_type: counter name: "airflow_sla_email_notification_failure" labels: airflow_id: "$1" - match: "*.ti.start.*.*" match_metric_type: counter name: "airflow_ti_start" labels: airflow_id: "$1" dag_id: "$2" task_id: "$3" - match: "*.ti.finish.*.*.*" match_metric_type: counter name: "airflow_ti_finish" labels: airflow_id: "$1" dag_id: "$2" task_id: "$3" state: "$4" - match: "*.dag.callback_exceptions" match_metric_type: counter name: "airflow_dag_callback_exceptions" labels: airflow_id: "$1" - match: "*.celery.task_timeout_error" match_metric_type: counter name: "airflow_celery_task_timeout_error" labels: airflow_id: "$1" - match: "*.dagbag_size" match_metric_type: gauge name: "airflow_dagbag_size" labels: airflow_id: "$1" - match: "*.dag_processing.import_errors" match_metric_type: gauge name: "airflow_dag_processing_import_errors" labels: airflow_id: "$1" - match: "*.dag_processing.total_parse_time" match_metric_type: gauge name: "airflow_dag_processing_total_parse_time" labels: airflow_id: "$1" - match: "*.dag_processing.last_runtime.*" match_metric_type: gauge name: "airflow_dag_processing_last_runtime" labels: airflow_id: "$1" dag_file: "$2" - match: "*.dag_processing.last_run.seconds_ago.*" match_metric_type: gauge name: "airflow_dag_processing_last_run_seconds" labels: airflow_id: "$1" dag_file: "$2" - match: "*.dag_processing.processor_timeouts" match_metric_type: gauge name: "airflow_dag_processing_processor_timeouts" labels: airflow_id: "$1" - match: "*.executor.open_slots" match_metric_type: gauge name: "airflow_executor_open_slots" labels: airflow_id: "$1" - match: "*.executor.queued_tasks" match_metric_type: gauge name: "airflow_executor_queued_tasks" labels: airflow_id: "$1" - match: "*.executor.running_tasks" match_metric_type: gauge name: "airflow_executor_running_tasks" labels: airflow_id: "$1" - match: "*.pool.open_slots.*" match_metric_type: gauge name: "airflow_pool_open_slots" labels: airflow_id: "$1" pool_name: "$2" - match: "*.pool.queued_slots.*" match_metric_type: gauge name: "airflow_pool_queued_slots" labels: airflow_id: "$1" pool_name: "$2" - match: "*.pool.running_slots.*" match_metric_type: gauge name: "airflow_pool_running_slots" labels: airflow_id: "$1" pool_name: "$2" - match: "*.pool.starving_tasks.*" match_metric_type: gauge name: "airflow_pool_starving_tasks" labels: airflow_id: "$1" pool_name: "$2" - match: "*.smart_sensor_operator.poked_tasks" match_metric_type: gauge name: "airflow_smart_sensor_operator_poked_tasks" labels: airflow_id: "$1" - match: "*.smart_sensor_operator.poked_success" match_metric_type: gauge name: "airflow_smart_sensor_operator_poked_success" labels: airflow_id: "$1" - match: "*.smart_sensor_operator.poked_exception" match_metric_type: gauge name: "airflow_smart_sensor_operator_poked_exception" labels: airflow_id: "$1" - match: "*.smart_sensor_operator.exception_failures" match_metric_type: gauge name: "airflow_smart_sensor_operator_exception_failures" labels: airflow_id: "$1" - match: "*.smart_sensor_operator.infra_failures" match_metric_type: gauge name: "airflow_smart_sensor_operator_infra_failures" labels: airflow_id: "$1" - match: "*.dagrun.dependency-check.*" match_metric_type: observer name: "airflow_dagrun_dependency_check" labels: airflow_id: "$1" dag_id: "$2" - match: "*.dag.*.*.duration" match_metric_type: observer name: "airflow_dag_task_duration" labels: airflow_id: "$1" dag_id: "$2" task_id: "$3" - match: "*.dag_processing.last_duration.*" match_metric_type: observer name: "airflow_dag_processing_duration" labels: airflow_id: "$1" dag_file: "$2" - match: "*.dagrun.duration.success.*" match_metric_type: observer name: "airflow_dagrun_duration_success" labels: airflow_id: "$1" dag_id: "$2" - match: "*.dagrun.duration.failed.*" match_metric_type: observer name: "airflow_dagrun_duration_failed" labels: airflow_id: "$1" dag_id: "$2" - match: "*.dagrun.schedule_delay.*" match_metric_type: observer name: "airflow_dagrun_schedule_delay" labels: airflow_id: "$1" dag_id: "$2" - match: "*.scheduler.critical_section_duration" match_metric_type: observer name: "airflow_scheduler_critical_section_duration" labels: airflow_id: "$1" - match: "*.dagrun.*.first_task_scheduling_delay" match_metric_type: observer name: "airflow_dagrun_first_task_scheduling_delay" labels: airflow_id: "$1" dag_id: "$2" ################################### # Database - PostgreSQL Chart # - https://github.com/helm/charts/tree/master/stable/postgresql ################################### postgresql: ## if the `stable/postgresql` chart is used enabled: false ## the postgres database to use postgresqlDatabase: airflow ## the postgres user to create postgresqlUsername: postgres ## the postgres user's password postgresqlPassword: airflow ## the name of a pre-created secret containing the postgres password existingSecret: "" ## the key within `postgresql.existingSecret` containing the password string existingSecretKey: "postgresql-password" ## configs for the PVC of postgresql persistence: ## if postgres will use Persistent Volume Claims to store data enabled: true ## the name of the StorageClass used by the PVC storageClass: "" ## the access modes of the PVC accessModes: - ReadWriteOnce ## the size of PVC to request size: 8Gi ## configs for the postgres StatefulSet master: ## annotations for the postgres Pod podAnnotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" ################################### # Database - External Database # - these configs are only used when `postgresql.enabled` is false ################################### externalDatabase: ## the type of external database: {mysql,postgres} type: mysql ## the host of the external database host: x.x.x.x # host: localhost ## the port of the external database # port: 5432 port: 3306 ## the database/scheme to use within the the external database database: airflow ## the user of the external database user: airflow ## the name of a pre-created secret containing the external database password passwordSecret: "airflow-postgresql" ## the key within `externalDatabase.passwordSecret` containing the password string passwordSecretKey: "postgresql-password" ## the connection properties for external database, e.g. "?sslmode=require" properties: "" ################################### # Database - Redis Chart # - https://github.com/helm/charts/tree/master/stable/redis ################################### redis: ## if the `stable/redis` chart is used enabled: true ## the redis password password: airflow ## the name of a pre-created secret containing the redis password existingSecret: "" ## the key within `redis.existingSecret` containing the password string existingSecretPasswordKey: "" ## configs for redis cluster mode cluster: ## if redis runs in cluster mode enabled: false ## the number of redis slaves slaveCount: 1 ## configs for the redis master master: ## resource requests/limits for the master Pod resources: {} ## annotations for the master Pod podAnnotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" ## configs for the PVC of the redis master persistence: ## use a PVC to persist data enabled: false ## the name of the StorageClass used by the PVC storageClass: "" ## the access mode of the PVC accessModes: - ReadWriteOnce ## the size of PVC to request size: 8Gi ## configs for the redis slaves slave: ## resource requests/limits for the slave Pods resources: {} ## annotations for the slave Pods podAnnotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "true" ## configs for the PVC of the redis slaves persistence: ## use a PVC to persist data enabled: false ## the name of the StorageClass used by the PVC storageClass: "" ## the access mode of the PVC accessModes: - ReadWriteOnce ## the size of PVC to request size: 8Gi ################################### # Database - External Database # - these configs are only used when `redis.enabled` is false ################################### externalRedis: ## the host of the external redis host: localhost ## the port of the external redis port: 6379 ## the database number to use within the the external redis databaseNumber: 1 ## the name of a pre-created secret containing the external redis password passwordSecret: "" ## the key within `externalRedis.passwordSecret` containing the password string passwordSecretKey: "redis-password" ## the connection properties for external redis, e.g. "?ssl_cert_reqs=CERT_OPTIONAL" properties: "" ################################### # Prometheus Operator - ServiceMonitor ################################### serviceMonitor: ## if ServiceMonitor resources should be deployed for airflow webserver enabled: false ## labels for ServiceMonitor, so that Prometheus can select it selector: prometheus: kube-prometheus ## the ServiceMonitor web endpoint path path: /admin/metrics ## the ServiceMonitor web endpoint interval interval: "30s" ################################### # Prometheus Operator - PrometheusRule ################################### prometheusRule: ## if PrometheusRule resources should be deployed for airflow webserver enabled: false ## labels for PrometheusRule, so that Prometheus can select it additionalLabels: {} ## alerting rules for Prometheus groups: [] ``` The docker image that we build is from this dockerfile: ```Dockerfile # VERSION 2.2.0 # DESCRIPTION: Basic Airflow container # BUILD: docker build --rm -t airflow/airflow-base . FROM python:3.7-slim-buster LABEL maintainer="DataOps_Team" # Never prompt the user for choices on installation/configuration of packages ENV DEBIAN_FRONTEND noninteractive ENV TERM linux # Airflow ARG AIRFLOW_VERSION=2.2.0 ARG AIRFLOW_USER_HOME=/usr/local/airflow ARG AIRFLOW_DEPS="" ARG PYTHON_DEPS="" ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME} # Define en_US. ENV LANGUAGE en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LC_ALL en_US.UTF-8 ENV LC_CTYPE en_US.UTF-8 ENV LC_MESSAGES en_US.UTF-8 #ENV TZ=America/Montevideo # Disable noisy "Handling signal" log messages: # ENV GUNICORN_CMD_ARGS --log-level WARNING #RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install tzdata #RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone #RUN dpkg-reconfigure locales --frontend noninteractive tzdata RUN set -ex \ && buildDeps=' \ freetds-dev \ libkrb5-dev \ libsasl2-dev \ libssl-dev \ libffi-dev \ libpq-dev \ git \ ' \ && apt-get update -yqq \ && apt-get upgrade -yqq \ && apt-get install -yqq --no-install-recommends \ $buildDeps \ freetds-bin \ build-essential \ default-libmysqlclient-dev \ apt-utils \ curl \ rsync \ netcat \ locales \ && sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \ && locale-gen \ && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \ && useradd -ms /bin/bash -d ${AIRFLOW_USER_HOME} airflow \ && pip install -U pip setuptools wheel \ && pip install pytz \ && pip install pyOpenSSL \ && pip install ndg-httpsclient \ && pip install pyasn1 \ && pip install apache-airflow[github_enterprise,crypto,s3,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \ && pip install 'redis==3.2' \ && pip uninstall SQLAlchemy -y \ && pip install SQLAlchemy==1.3.15 \ && pip install pyhive \ && if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi \ && apt-get purge --auto-remove -yqq $buildDeps \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf \ /var/lib/apt/lists/* \ /tmp/* \ /var/tmp/* \ /usr/share/man \ /usr/share/doc \ /usr/share/doc-base COPY script/entrypoint.sh /entrypoint.sh COPY config/airflow.cfg ${AIRFLOW_USER_HOME}/airflow.cfg RUN chown -R airflow: ${AIRFLOW_USER_HOME} EXPOSE 8080 5555 8793 USER airflow WORKDIR ${AIRFLOW_USER_HOME} ENTRYPOINT ["/entrypoint.sh"] CMD ["webserver"] ``` And the error is that it cannot find the dumb-init file so none of the pods are deployed and fail. This, done with the image 2.1.0, works correctly because in 2.1.x if the file /usr/bin/dumb-init is found. job-upgrade-db.yaml ```yaml apiVersion: batch/v1 kind: Job metadata: name: {{ include "airflow.fullname" . }}-upgrade-db {{- /* this job can't be a post-install hook if `--wait` is passed to helm, */ -}} {{- /* because this job must run BEFORE other resource can become ready, */ -}} {{- /* meaning the install would never finish */ -}} {{- if not .Values.helmWait }} annotations: helm.sh/hook: post-install,post-upgrade helm.sh/hook-weight: "-10" helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded {{- end }} labels: app: {{ include "airflow.labels.app" . }} component: jobs chart: {{ include "airflow.labels.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }} spec: ttlSecondsAfterFinished: 300 template: metadata: annotations: {{- if .Values.airflow.podAnnotations }} {{- toYaml .Values.airflow.podAnnotations | nindent 8 }} {{- end }} labels: app: {{ include "airflow.labels.app" . }} component: jobs chart: {{ include "airflow.labels.chart" . }} release: {{ .Release.Name }} heritage: {{ .Release.Service }} spec: restartPolicy: OnFailure {{- if .Values.airflow.image.pullSecret }} imagePullSecrets: - name: {{ .Values.airflow.image.pullSecret }} {{- end }} serviceAccountName: {{ include "airflow.serviceAccountName" . }} initContainers: {{- include "airflow.init_container.check_db" . | indent 8 }} containers: - name: upgrade-db {{- include "airflow.image" . | indent 10 }} envFrom: {{- include "airflow.envFrom" . | indent 12 }} env: {{- include "airflow.env" . | indent 12 }} command: - "/usr/bin/dumb-init" - "--" args: - "bash" - "-c" {{- if .Values.airflow.legacyCommands }} - "exec airflow upgradedb" {{- else }} - "exec airflow db upgrade" {{- end }} ```

Thanks and regards!!

minnieshi commented 3 years ago

I do not think it should be reported as a bug.

The README file explains the version of airflow the chart 8.5.2 support is 2.1.x

https://github.com/airflow-helm/charts/blob/main/charts/airflow/README.md

This should help to understand helm chart API/basic concept : https://helm.sh/docs/topics/charts/

jholowaty commented 3 years ago

Hi! i talked with Mathew Wicks in Apache Airflow Community's slack and he recommended me to use 2.2.x in chart 8.5.2 but i tried it and fail because that version doesn't have the file dumb-init and crashed in upgrade-db step, so he told me that open a ticket here.

Mathew Wicks 3 days ago @Juan Holowaty you can actually use the community chart with 2.2.0 already.

Mathew Wicks 3 days ago Just use version 8.5.2 , (but I recommend testing any updates in a dev environment first, it's just good practice). You can read the docs on "How to use a specific version of airflow?" for how to use a non-default airflow image.

thesuperzapper commented 2 years ago

@jholowaty @minnieshi Hey people, sorry for the delay, been working on another thing, but I will look into this now.

Not that I recommend updating to 2.2.X until it prove itself to be stable.

baryluk commented 2 years ago

Chart 8.5.0 works fine with airflow 2.2.x itself. I updated from 2.1.4 to 2.2.1 without any issues at all related to the chart. Of course, if you use some changes that are in 2.2.x that require changes to env variables and so on, then you should not upgrade.

The chart did work, but airflow fails later when running tasks on k8s. Unclear why, but I doubt it is fault of the chart.

thesuperzapper commented 2 years ago

@baryluk or @jholowaty have you discovered the cause of your issues?

I am not sure what could cause dumb-init not found unless you are trying to use a container image that is not based on apache/airflow:xxxx or you are somehow mounting over the /usr directory.

baryluk commented 2 years ago

@thesuperzapper yeah. I solved my issue, it was unrelated to dumb-init. It was because of using custom airflow dag folder path, which now required passing stuff more explicitly. Running Airflow 2.2.2 on chart 8.5.2 without any issues right now.

thesuperzapper commented 2 years ago

Thanks, @baryluk I will close this issue.