airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
630 stars 474 forks source link

Default values not working #756

Closed jackchuong closed 9 months ago

jackchuong commented 1 year ago

Checks

Chart Version

latest

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:40:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.4", GitCommit:"f89670c3aa4059d6999cb42e23ccb4f0b9a03979", GitTreeState:"clean", BuildDate:"2023-04-12T12:05:35Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}

Description

I was trying helm install airflow airflow-stable/airflow -f values.yml I also tried

Relevant Logs

Describe pod

  Normal   Scheduled    104s                default-scheduler  Successfully assigned namespace/airflow-db-migrations-84b7f87494-4hc42 to k8s-worker3
  Warning  FailedMount  102s                kubelet            MountVolume.SetUp failed for volume "scripts" : failed to sync secret cache: timed out waiting for the condition
  Normal   Pulled       73s (x3 over 101s)  kubelet            Container image "apache/airflow:2.5.3-python3.8" already present on machine
  Normal   Created      73s (x3 over 101s)  kubelet            Created container check-db
  Normal   Started      73s (x3 over 101s)  kubelet            Started container check-db
  Warning  BackOff      1s (x3 over 86s)    kubelet            Back-off restarting failed container check-db in pod airflow-db-migrations-84b7f87494-4hc42_namespace(a5363d98-30be-417c-8f81-671b8acb41b7)


### Custom Helm Values

_No response_
thesuperzapper commented 1 year ago

@jackchuong it looks like your postgresql Pod is not starting, I bet its because your Kubernetes cluster has no default StorageClass, so it cant create a PVC Volume.

jackchuong commented 1 year ago

@thesuperzapper thank for your reply, I have checked again, the reason is: I installed airflow by Helm failed the first time and command helm uninstall airflow didn't clean up every things completely , there remains some PVC (pending) and secret , I deleted them manually and chart's values work fine with 2 PVC I created for postgresql and logs. However I get another problem when trying to config git-sync sidecar for dags

sample-values-CeleryKubernetesExecutor.yaml
dags:
  ## the airflow dags folder
  path: /opt/airflow/dags

  ## configs for the dags PVC
  ## [FAQ] https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/dags/load-dag-definitions.md
  persistence:
    enabled: false

  ## configs for the git-sync sidecar
  ## [FAQ] https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/dags/load-dag-definitions.md
  gitSync:
    enabled: true
    repo: "git@gitlab.mydomain.com/myuser/airflow-dags.git"
    repoSubPath: ""
    branch: main
    revision: HEAD
    depth: 1
    syncWait: 60
    syncTimeout: 120
    submodules: recursive
    sshSecret: "airflow-ssh-git-secret"
    sshSecretKey: "id_rsa"
    sshKnownHosts: ""
    maxFailures: 0

gitlab.mydomain.com is a Gitlab CE on premise , it's working normally , I created repo myuser/airflow-dags , added ssh public key into myuser profile I also created secret airflow-ssh-git-secret contains ssh private key

kubectl describe secret/airflow-ssh-git-secret
Name:         airflow-ssh-git-secret
Namespace:    
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
id_rsa:  1834 bytes

The status of pod after helm install

kubectl get pod
NAME                                     READY   STATUS                  RESTARTS     AGE
airflow-db-migrations-5bc7cbf67b-vtlpx   0/2     Init:CrashLoopBackOff   7 (3m13s ago)   14m
airflow-flower-85cfcf5c47-f9clm          0/2     Init:CrashLoopBackOff   7 (3m18s ago)   14m
airflow-pgbouncer-5f5944d598-cdlk9       1/1     Running                 0               14m
airflow-postgresql-0                     1/1     Running                 0               14m
airflow-redis-master-0                   1/1     Running                 0               14m
airflow-scheduler-d8fdbf78d-hqt8k        0/2     Init:CrashLoopBackOff   7 (3m10s ago)   14m
airflow-sync-users-857b457984-ntsc2      0/2     Init:CrashLoopBackOff   7 (3m18s ago)   14m
airflow-triggerer-bcb998dd7-8shhl        0/2     Init:CrashLoopBackOff   7 (3m17s ago)   14m
airflow-web-6f8cf6c875-fnh8w             0/2     Init:CrashLoopBackOff   7 (3m13s ago)   14m
airflow-worker-0                         0/2     Init:CrashLoopBackOff   7 (3m16s ago)   14m
spark-master-0                           1/1     Running                 0               27h
spark-worker-0                           1/1     Running                 0               27h
spark-worker-1                           1/1     Running                 0               27h

So I guess something wrong with pod airflow-sync-users-857b457984-ntsc2 ?

kubectl describe pod airflow-sync-users-857b457984-ntsc2
...
Init Containers:
  dags-git-clone:
    Container ID:   docker://0101410f3b17e17bf7490aacc5e5ccebd5a571def1f270912d156567b4b64a1a
    Image:          registry.k8s.io/git-sync/git-sync:v3.6.5
    Image ID:       docker-pullable://registry.k8s.io/git-sync/git-sync@sha256:7231f6c2284758b91caed71e4e596413df31ac4467de9b596dc6b386b82f624f
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sat, 01 Jul 2023 17:57:14 +0700
      Finished:     Sat, 01 Jul 2023 17:57:14 +0700
    Ready:          False
    Restart Count:  3
    Environment Variables from:
      airflow-config-envs  Secret  Optional: false
    Environment:
      GIT_SYNC_ONE_TIME:           true
      GIT_SYNC_ROOT:               /dags
      GIT_SYNC_DEST:               repo
      GIT_SYNC_REPO:               git@gitlab.mydomain.com/myuser/airflow-dags.git
      GIT_SYNC_BRANCH:             main
      GIT_SYNC_REV:                HEAD
      GIT_SYNC_DEPTH:              1
      GIT_SYNC_WAIT:               60
      GIT_SYNC_TIMEOUT:            120
      GIT_SYNC_ADD_USER:           true
      GIT_SYNC_MAX_SYNC_FAILURES:  0
      GIT_SYNC_SUBMODULES:         recursive
      GIT_SYNC_SSH:                true
      GIT_SSH_KEY_FILE:            /etc/git-secret/id_rsa
      GIT_KNOWN_HOSTS:             false
      DATABASE_USER:               postgres
      DATABASE_PASSWORD:           <set to the key 'postgresql-password' in secret 'airflow-postgresql'>  Optional: false
      REDIS_PASSWORD:              <set to the key 'redis-password' in secret 'airflow-redis'>            Optional: false
      CONNECTION_CHECK_MAX_COUNT:  0
    Mounts:
      /dags from dags-data (rw)
      /etc/git-secret/id_rsa from git-secret (ro,path="id_rsa")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s2wtx (ro)
  check-db:
    Container ID:
    Image:         apache/airflow:2.5.3-python3.8
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/dumb-init
      --
      /entrypoint
    Args:
      bash
      -c
      exec timeout 60s airflow db check
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      airflow-config-envs  Secret  Optional: false
    Environment:
      DATABASE_USER:               postgres
      DATABASE_PASSWORD:           <set to the key 'postgresql-password' in secret 'airflow-postgresql'>  Optional: false
      REDIS_PASSWORD:              <set to the key 'redis-password' in secret 'airflow-redis'>            Optional: false
      CONNECTION_CHECK_MAX_COUNT:  0
    Mounts:
      /opt/airflow/dags from dags-data (rw)
      /opt/airflow/logs from logs-data (rw,path="airflow-logs")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s2wtx (ro)
  wait-for-db-migrations:
    Container ID:
    Image:         apache/airflow:2.5.3-python3.8
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/dumb-init
      --
      /entrypoint
    Args:
      bash
      -c
      exec airflow db check-migrations -t 60
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      airflow-config-envs  Secret  Optional: false
    Environment:
      DATABASE_USER:               postgres
      DATABASE_PASSWORD:           <set to the key 'postgresql-password' in secret 'airflow-postgresql'>  Optional: false
      REDIS_PASSWORD:              <set to the key 'redis-password' in secret 'airflow-redis'>            Optional: false
      CONNECTION_CHECK_MAX_COUNT:  0
    Mounts:
      /opt/airflow/dags from dags-data (rw)
      /opt/airflow/logs from logs-data (rw,path="airflow-logs")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s2wtx (ro)
...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  74s                default-scheduler  Successfully assigned namespace/airflow-sync-users-857b457984-ntsc2 to k8s-worker2
  Normal   Pulled     27s (x4 over 71s)  kubelet            Container image "registry.k8s.io/git-sync/git-sync:v3.6.5" already present on machine
  Normal   Created    27s (x4 over 71s)  kubelet            Created container dags-git-clone
  Normal   Started    26s (x4 over 71s)  kubelet            Started container dags-git-clone
  Warning  BackOff    14s (x6 over 70s)  kubelet            Back-off restarting failed container dags-git-clone in pod airflow-sync-users-857b457984-ntsc2_namespace(f515be0b

Doesn't it work with Git repo on premise ? Or I did something wrong ?

thesuperzapper commented 12 months ago

@jackchuong you need to look at the logs for one of those failing pods, it is not possible to know what's failing otherwise.

BTW, I highly recommend k9s, a CLI tool for managing Kubernetes clusters, it's very easy to view things like pod logs. (You just press "L" when highlighting a pod to see its logs).