airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
665 stars 476 forks source link

Unable to write logs to s3 #821

Closed DenisKirichenko24 closed 10 months ago

DenisKirichenko24 commented 10 months ago

Checks

Chart Version

8.7.1

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:43Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}

Description

We used to send logs directly to S3. However, now we've decided to send them to the standard directory /opt/airflow/logs by mounting our PVC, which in turn points to an S3 bucket like it mentioned here. Since logs from various DAGs are often not written, we decided that addressing log loss might be resolved by changing and testing a different logging algorithm described here.

After applying these changes, our airflow-db-migrations pod starts, and within it, there is a check-db container that fails with a PermissionDenied error for /opt/airflow/logs/scheduler.

I'm attaching the complete log below.

If we revert everything back to how it was before and connect as described here S3 Bucket, everything starts correctly, and all pods come up without errors.

I'm providing all our configurations for PVC/PV and Airflow below.

Please, can you advise where the error might be? Thank you in advance!

PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airflow-logs
  namespace: airflow
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  storageClassName: csi-s3
  volumeName: airflow-logs

PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: airflow-logs
  namespace: airflow
spec:
  storageClassName: csi-s3
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  csi:
    driver: ru.yandex.s3.csi
    volumeHandle: bucket/airflow/logs
    controllerPublishSecretRef:
      name: csi-s3-secret
      namespace: airflow
    nodePublishSecretRef:
      name: csi-s3-secret
      namespace: airflow
    nodeStageSecretRef:
      name: csi-s3-secret
      namespace: airflow
    volumeAttributes:
      capacity: 1Gi
      mounter: geesefs

Relevant Logs

Unable to load the config, contains a configuration error.Traceback (most recent call last):  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir    self._accessor.mkdir(self, mode)FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2024-02-01'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):  File "/usr/local/lib/python3.8/logging/config.py", line 563, in configure    handler = self.configure_handler(handlers[name])  File "/usr/local/lib/python3.8/logging/config.py", line 744, in configure_handler    result = factory(**kwargs)  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/file_processor_handler.py", line 49, in __init__    Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)  File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir    self.parent.mkdir(parents=True, exist_ok=True)  File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'The above exception was the direct cause of the following exception:
Traceback (most recent call last):  File "/home/airflow/.local/bin/airflow", line 5, in <module>    from airflow.__main__ import main  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__init__.py", line 64, in <module>    settings.initialize()  
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/settings.py", line 570, in initialize    LOGGING_CLASS_PATH = configure_logging()

Custom Helm Values

airflow:
  legacyCommands: false
  executor: CeleryKubernetesExecutor
  image:
    repository: cr.yandex/crp1uvj38k3uhag59uoq/airflow-2.5.3-python3.8
    tag: mars-image-0.1.8
  defaultNodeSelector:
    custom.yandex.cloud/node-group-name: platform
  config:
    AIRFLOW__CELERY__FLOWER_URL_PREFIX: /airflow/flower
    AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 120
    AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT: 100
    AIRFLOW__LOGGING__LOGGING_LEVEL: INFO
    AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: s3://bucket
    AIRFLOW__LOGGING__REMOTE_LOGGING: False
    AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: yandex_s3
    AIRFLOW__METRICS__STATSD_HOST: prometheus-statsd-exporter
    AIRFLOW__METRICS__STATSD_ON: True
    AIRFLOW__METRICS__STATSD_PORT: 9125
    AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: True
    AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: 120
    AIRFLOW__SECRETS__BACKEND: airflow.providers.hashicorp.secrets.vault.VaultBackend
    AIRFLOW__WEBSERVER__BASE_URL: https://dev.yac.rupn.cloud-effem.com/airflow
    AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX: True
  extraEnv:
  - name: AIRFLOW__SECRETS__BACKEND_KWARGS
    valueFrom:
      secretKeyRef:
        key: value
        name: vault-backend-kwargs
  - name: AIRFLOW__CORE__FERNET_KEY
    valueFrom:
      secretKeyRef:
        key: value
        name: airflow-fernet-key
  - name: AIRFLOW__WEBSERVER__SECRET_KEY
    valueFrom:
      secretKeyRef:
        key: value
        name: airflow-webserver-key
  extraPipPackages:
  - apache-airflow-providers-hashicorp==3.3.0
  - hvac==1.1.0
  extraVolumeMounts:
  - mountPath: /opt/airflow/plugins
    name: airflow-plugins
    readOnly: true
  - mountPath: /opt/airflow/logs
    name: airflow-logs 
  extraVolumes:
  - name: airflow-plugins
    persistentVolumeClaim:
      claimName: airflow-plugins
  - name: airflow-logs
    persistentVolumeClaim:
      claimName: airflow-logs
  users:
  - email: ${ADMIN_EMAIL}
    firstName: admin
    lastName: admin
    password: ${ADMIN_PASSWORD}
    role: Admin
    username: admin
  usersTemplates:
    ADMIN_EMAIL:
      key: email
      kind: secret
      name: admin-user
    ADMIN_PASSWORD:
      key: password
      kind: secret
      name: admin-user
  usersUpdate: true

dags:
  path: /opt/airflow/dags
  persistence:
    enabled: true
    existingClaim: airflow-dags

web:
  enabled: true
  webserverConfig:
    existingSecret: airflow-webserver-config

flower:
  enabled: true

ingress:
  enabled: true
  apiVersion: networking.k8s.io/v1
  web:
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt
    host: dev.yac.rupn.cloud-effem.com
    path: /airflow
    ingressClassName: nginx
    tls:
      enabled: true
      secretName: tls-secret
  flower:
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt
    host: dev.yac.rupn.cloud-effem.com
    path: /airflow/flower
    ingressClassName: nginx
    tls:
      enabled: true
      secretName: tls-secret

redis:
  enabled: true
  existingSecret: airflow-redis
  existingSecretKey: redis-password

postgresql:
  enabled: true
  existingSecret: airflow-postgresql
  existingSecretKey: postgresql-password
  persistence:
    enabled: true
    storageClass: yc-network-ssd-nonreplicated
    size: 93Gi

serviceAccount:
  create: true
  name: airflow

serviceMonitor:
  enabled: true
  selector:
    prometheus: platform

scheduler:
  replicas: 1

triggerer:
  enabled: true

workers:
  enabled: true
  replicas: 1
  nodeSelector:
    custom.yandex.cloud/node-group-name: dev
  extraVolumes:
  - name: yandex-sa-secret-volume
    secret:
      secretName: airflow-sa-key
  extraVolumeMounts:
  - name: yandex-sa-secret-volume
    mountPath: /etc/yc
    readOnly: true
DenisKirichenko24 commented 10 months ago

Was fixed with granted permitions in options step in pvc manifest

    volumeAttributes:
      options: "--memory-limit 1000 --dir-mode 0777 --file-mode 0666 --uid 50000"