The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
FROM apache/airflow:2.9.0
RUN pip install apache-airflow-providers-tableau snowflake-connector-python snowflake-sqlalchemy apache-airflow-providers-snowflake pendulum
I am my using custom docker image based on the official docker image, with the latest version of Airflow - 2.9.0.
I'm able to deploy Airflow using the official helm chart on AWS EKS.
But after a while, my scheduler just keeps restarting in a loop. Then I found that the issue was that scheduler-log-groom was missing permission on the ‘/opt/airflow/logs’ folder.
Then I updated my values.yaml file with extraInitContainers(spec attached below) in the scheduler.
But, After upgradging chart I still receive scheduler errors in the logs. Now I see that Livenes Probe is not able to access to the "/opt/airflow/logs/scheduler" folder.
Relevant Logs
Name: dna-airflow-scheduler-5cc8cfd8f6-hx2bl
Namespace: airflow
Priority: 0
Service Account: dna-airflow-scheduler
Node: ip-172-18-231-89.us-west-2.compute.internal/172.18.231.89
Start Time: Fri, 12 Apr 2024 22:49:41 +0200
Labels: component=scheduler
pod-template-hash=5cc8cfd8f6
release=dna-airflow
tier=airflow
Annotations: checksum/airflow-config: 6fb676fa1295f9e8afd5408033a62ecaf465ddd5339dff805ffbcf8e653848dc
checksum/extra-configmaps: e862ea47e13e634cf17d476323784fa27dac20015550c230953b526182f5cac8
checksum/extra-secrets: e9582fdd622296c976cbc10a5ba7d6702c28a24fe80795ea5b84ba443a56c827
checksum/metadata-secret: b2fe937560e9635aeb01fce9100c2f836c5880f81c802565ce95fbcc8a56da4c
checksum/pgbouncer-config-secret: 1dae2adc757473469686d37449d076b0c82404f61413b58ae68b3c5e99527688
checksum/result-backend-secret: 98a68f230007cfa8f5d3792e1aff843a76b0686409e4a46ab2f092f6865a1b71
cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status: Running
IP: 100.65.129.143
IPs:
IP: 100.65.129.143
Controlled By: ReplicaSet/dna-airflow-scheduler-5cc8cfd8f6
Init Containers:
wait-for-airflow-migrations:
Container ID: containerd://2bce7a37555105485909d706f8d5264156f87a598dc9e14a0272db55cb5f328c
Image: ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Image ID: docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Port: <none>
Host Port: <none>
Args:
airflow
db
check-migrations
--migration-wait-timeout=60
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 12 Apr 2024 22:49:44 +0200
Finished: Fri, 12 Apr 2024 22:50:17 +0200
Ready: True
Restart Count: 0
Environment:
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'dna-airflow-fernet-key'> Optional: false
AIRFLOW_HOME: /opt/airflow
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'dna-airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
git-sync-init:
Container ID: containerd://6fab71c316323da5dd38ab10af8c70ca0fd6590152f985fb3ed152987631dafd
Image: registry.k8s.io/git-sync/git-sync:v4.1.0
Image ID: registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 12 Apr 2024 22:50:18 +0200
Finished: Fri, 12 Apr 2024 22:50:21 +0200
Ready: True
Restart Count: 0
Environment:
GIT_SYNC_USERNAME: <set to the key 'GIT_SYNC_USERNAME' in secret 'git-credentials'> Optional: false
GITSYNC_USERNAME: <set to the key 'GITSYNC_USERNAME' in secret 'git-credentials'> Optional: false
GIT_SYNC_PASSWORD: <set to the key 'GIT_SYNC_PASSWORD' in secret 'git-credentials'> Optional: false
GITSYNC_PASSWORD: <set to the key 'GITSYNC_PASSWORD' in secret 'git-credentials'> Optional: false
GIT_SYNC_REV: HEAD
GITSYNC_REF: main
GIT_SYNC_BRANCH: main
GIT_SYNC_REPO: https://github.com/.../airflow-bizapps-dev.git
GITSYNC_REPO: https://github.com/.../airflow-bizapps-dev.git
GIT_SYNC_DEPTH: 1
GITSYNC_DEPTH: 1
GIT_SYNC_ROOT: /git
GITSYNC_ROOT: /git
GIT_SYNC_DEST: repo
GITSYNC_LINK: repo
GIT_SYNC_ADD_USER: true
GITSYNC_ADD_USER: true
GITSYNC_PERIOD: 5s
GIT_SYNC_MAX_SYNC_FAILURES: 0
GITSYNC_MAX_FAILURES: 0
GIT_SYNC_ONE_TIME: true
GITSYNC_ONE_TIME: true
Mounts:
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
fix-volume-logs-permissions:
Container ID: containerd://1cae61ad7f4eab50dd360f064b8645fd6c7fde7696e2346e2098d5d3e81c4879
Image: busybox
Image ID: docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966
Port: <none>
Host Port: <none>
Command:
sh
-c
chown -R 50000:0 /opt/airflow/logs/
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 12 Apr 2024 22:50:23 +0200
Finished: Fri, 12 Apr 2024 22:50:23 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/opt/airflow/logs/ from logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
Containers:
scheduler:
Container ID: containerd://43eff9161606c18596c87a4504de3e782367cf87ec63ecc0650f72db32d75032
Image: ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Image ID: docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Port: <none>
Host Port: <none>
Args:
bash
-c
exec airflow scheduler
State: Running
Started: Fri, 12 Apr 2024 23:32:46 +0200
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 12 Apr 2024 23:23:46 +0200
Finished: Fri, 12 Apr 2024 23:32:45 +0200
Ready: True
Restart Count: 6
Liveness: exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --local
] delay=10s timeout=20s period=60s #success=1 #failure=5
Startup: exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --local
] delay=0s timeout=20s period=10s #success=1 #failure=6
Environment:
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'dna-airflow-fernet-key'> Optional: false
AIRFLOW_HOME: /opt/airflow
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-rds-db'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'dna-airflow-webserver-secret-key'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
/opt/airflow/dags from dags (ro)
/opt/airflow/logs from logs (rw)
/opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
git-sync:
Container ID: containerd://efae96265bca88f581c3e92c5f138f3ae7ca4db805aa0797016ccb375ad4b90e
Image: registry.k8s.io/git-sync/git-sync:v4.1.0
Image ID: registry.k8s.io/git-sync/git-sync@sha256:fd9722fd02e3a559fd6bb4427417c53892068f588fc8372aa553fbf2f05f9902
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 12 Apr 2024 22:50:23 +0200
Ready: True
Restart Count: 0
Environment:
GIT_SYNC_USERNAME: <set to the key 'GIT_SYNC_USERNAME' in secret 'git-credentials'> Optional: false
GITSYNC_USERNAME: <set to the key 'GITSYNC_USERNAME' in secret 'git-credentials'> Optional: false
GIT_SYNC_PASSWORD: <set to the key 'GIT_SYNC_PASSWORD' in secret 'git-credentials'> Optional: false
GITSYNC_PASSWORD: <set to the key 'GITSYNC_PASSWORD' in secret 'git-credentials'> Optional: false
GIT_SYNC_REV: HEAD
GITSYNC_REF: main
GIT_SYNC_BRANCH: main
GIT_SYNC_REPO: https://github.com/.../airflow-bizapps-dev.git
GITSYNC_REPO: https://github.com/.../airflow-bizapps-dev.git
GIT_SYNC_DEPTH: 1
GITSYNC_DEPTH: 1
GIT_SYNC_ROOT: /git
GITSYNC_ROOT: /git
GIT_SYNC_DEST: repo
GITSYNC_LINK: repo
GIT_SYNC_ADD_USER: true
GITSYNC_ADD_USER: true
GITSYNC_PERIOD: 5s
GIT_SYNC_MAX_SYNC_FAILURES: 0
GITSYNC_MAX_FAILURES: 0
Mounts:
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
scheduler-log-groomer:
Container ID: containerd://dc0b21c1c37f7aed6a7ff67e812ff95a125297ddae1975f2365511cf9b0a3cbc
Image: ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Image ID: docker.io/ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Running
Started: Fri, 12 Apr 2024 22:50:24 +0200
Ready: True
Restart Count: 0
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 15
AIRFLOW_HOME: /opt/airflow
Mounts:
/opt/airflow/logs from logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t7tng (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dna-airflow-config
Optional: false
dags:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
logs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: dna-airflow-logs
ReadOnly: false
kube-api-access-t7tng:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 48m default-scheduler Successfully assigned airflow/dna-airflow-scheduler-5cc8cfd8f6-hx2bl to ip-172-18-231-89.us-west-2.compute.internal
Normal Pulling 48m kubelet Pulling image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235"
Normal Pulled 48m kubelet Successfully pulled image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" in 1.918s (1.918s including waiting)
Normal Created 48m kubelet Created container wait-for-airflow-migrations
Normal Started 48m kubelet Started container wait-for-airflow-migrations
Normal Pulled 48m kubelet Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
Normal Created 48m kubelet Created container git-sync-init
Normal Started 48m kubelet Started container git-sync-init
Normal Pulling 48m kubelet Pulling image "busybox"
Normal Pulled 48m kubelet Successfully pulled image "busybox" in 619ms (619ms including waiting)
Normal Created 48m kubelet Created container fix-volume-logs-permissions
Normal Pulled 48m kubelet Container image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" already present on machine
Normal Started 48m kubelet Started container fix-volume-logs-permissions
Normal Created 48m kubelet Created container scheduler
Normal Started 48m kubelet Started container scheduler
Normal Pulled 48m kubelet Container image "registry.k8s.io/git-sync/git-sync:v4.1.0" already present on machine
Normal Created 48m kubelet Created container git-sync
Normal Started 48m kubelet Started container git-sync
Normal Created 48m kubelet Created container scheduler-log-groomer
Normal Started 48m kubelet Started container scheduler-log-groomer
Warning Unhealthy 47m kubelet Startup probe failed: command "sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \\\nairflow jobs check --job-type SchedulerJob --local\n" timed out
Warning Unhealthy 47m kubelet Startup probe failed: /home/airflow/.local/lib/python3.12/site-packages/airflow/metrics/statsd_logger.py:184 RemovedInAirflow3Warning: The basic metric validator will be deprecated in the future in favor of pattern-matching. You can try this now by setting config option metrics_use_pattern_match to True.
No alive jobs found.
Normal Pulled 47m (x2 over 48m) kubelet Container image "ravilkhalilov/airflow-demo@sha256:2af0e928daca24e5b83e1ac4e8d701cf72d2c0de5f3f1e38937826218e860235" already present on machine
Warning Unhealthy 47m kubelet Startup probe failed:
Warning Unhealthy 47m kubelet Startup probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task cb4dafd9cf37d9ad90bd10a4d36ae47b7d3c3b714efd4a1022a971fce25ca6be not found: not found
Warning Unhealthy 49s (x26 over 45m) kubelet Liveness probe failed: /home/airflow/.local/lib/python3.12/site-packages/airflow/metrics/statsd_logger.py:184 RemovedInAirflow3Warning: The basic metric validator will be deprecated in the future in favor of pattern-matching. You can try this now by setting config option metrics_use_pattern_match to True.
Unable to load the config, contains a configuration error.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/pathlib.py", line 1311, in mkdir
os.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler/2024-04-12'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/logging/config.py", line 581, in configure
handler = self.configure_handler(handlers[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/logging/config.py", line 848, in configure_handler
result = factory(**kwargs)
^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/log/file_processor_handler.py", line 53, in __init__
Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)
File "/usr/local/lib/python3.12/pathlib.py", line 1320, in mkdir
if not exist_ok or not self.is_dir():
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/pathlib.py", line 875, in is_dir
return S_ISDIR(self.stat().st_mode)
^^^^^^^^^^^
File "/usr/local/lib/python3.12/pathlib.py", line 840, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler/2024-04-12'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 5, in <module>
from airflow.__main__ import main
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/__init__.py", line 61, in <module>
settings.initialize()
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/settings.py", line 531, in initialize
LOGGING_CLASS_PATH = configure_logging()
^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/logging_config.py", line 74, in configure_logging
raise e
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/logging_config.py", line 69, in configure_logging
dictConfig(logging_config)
File "/usr/local/lib/python3.12/logging/config.py", line 914, in dictConfig
dictConfigClass(config).configure()
File "/usr/local/lib/python3.12/logging/config.py", line 588, in configure
raise ValueError('Unable to configure handler '
ValueError: Unable to configure handler 'processor'
Checks
User-Community Airflow Helm Chart
.Chart Version
1.13.1
Kubernetes Version
Custom docker file
Helm Version
Description
Hi,
I am my using custom docker image based on the official docker image, with the latest version of Airflow - 2.9.0. I'm able to deploy Airflow using the official helm chart on AWS EKS.
But after a while, my scheduler just keeps restarting in a loop. Then I found that the issue was that scheduler-log-groom was missing permission on the ‘/opt/airflow/logs’ folder.
Then I updated my values.yaml file with extraInitContainers(spec attached below) in the scheduler.
But, After upgradging chart I still receive scheduler errors in the logs. Now I see that Livenes Probe is not able to access to the "/opt/airflow/logs/scheduler" folder.
Relevant Logs
Custom Helm Values