apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.12k stars 14.02k forks source link

KubernetesExecutor does not save logs #40861

Open mrybas opened 1 month ago

mrybas commented 1 month ago

Official Helm Chart version

1.14.0 (latest released)

Apache Airflow version

v2.9.3

Kubernetes Version

v1.29.5

Helm Chart configuration

executor: "CeleryKubernetesExecutor"
config:
  logging:
    remote_logging: "True"
    remote_base_log_folder: s3://path/to/logs/
    remote_log_conn_id: ceph_default
    encrypt_s3_logs: 'False'
logs:
  persistence:
    enabled: true
    size: 10Gi

Docker Image customizations

No response

What happened

When using Celery Kubernetes Executor, logs are not saved after deleting a pod, nor in persistence storage with settings

  logs:
    persistence:
      enabled: true
      size: 10Gi

nor in s3 repository with settings

  config:
    logging:
      remote_logging: "True"
      remote_base_log_folder: s3://path/to/logs/
      remote_log_conn_id: ceph_default
      encrypt_s3_logs: 'False'

At the same time, Celery Executor saves logs both in persistence storage and in s3

What you think should happen instead

after deleting the pod, the logs should be saved in the s3 storage

How to reproduce

deploy airflow from helm chart with

executor: "CeleryKubernetesExecutor"
config:
  logging:
    remote_logging: "True"
    remote_base_log_folder: s3://path/to/logs/
    remote_log_conn_id: ceph_default
    encrypt_s3_logs: 'False'

and run dag

import datetime
import airflow

from airflow.operators.python import PythonOperator
import logging
from time import sleep

with airflow.DAG(
    "sample_celery_kubernetes",
    start_date=datetime.datetime(2022, 1, 1),
    schedule_interval=None,
) as dag:

    def kubernetes_example():
        logging.info("This task runs using KubernetesExecutor")
        sleep(10)
        logging.info("Task completed")

    def celery_example():
        logging.info("This task runs using CeleryExecutor")
        sleep(10)
        logging.info("Task completed")

    # To run with KubernetesExecutor, set queue to kubernetes
    task_kubernetes = PythonOperator(
        task_id="task-kubernetes",
        python_callable=kubernetes_example,
        dag=dag,
        queue="kubernetes",
    )

    # To run with CeleryExecutor, omit the queue argument
    task_celery = PythonOperator(
        task_id="task-celery", python_callable=celery_example, dag=dag
    )

    _ = task_kubernetes >> task_celery

for Kubernetes Executor *** No logs found on s3 for for celery *** Found logs in s3:

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 month ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

omkar-foss commented 1 month ago

At the same time, Celery Executor saves logs both in persistence storage and in s3

This issue could be similar to https://github.com/apache/airflow/issues/21548 (but not the same). One or more conditions added for Celery Executor may need to be extended for the Celery Kubernetes Executor.

@potiuk @shahar1 May I pick this up? Please let me know if I'll need to get some context before checking this out.

omkar-foss commented 1 month ago

Hi @mrybas, to check this out, I tried to reproduce this issue in a Kubernetes cluster with Celery Kubernetes Executor and persistent storage enabled. But in my case, I see that the logs are being preserved in the persistent storage after the pods are killed and restarted.

If possible, can you try with only persistence storage enabled and remote logging to S3 disabled? It may help narrow down the issue.

Also, apart from the ones you've mentioned above, if there are any other changes in the values.yaml for your helm chart, please share here. Might be useful in reproducing this issue. Thanks.

mrybas commented 1 month ago

If I use a Celery queue, everything is saved. If I use a Kubernetes queue, then no.

omkar-foss commented 4 weeks ago

If possible, can you try with only persistence storage enabled and remote logging to S3 disabled? It may help narrow down the issue.

@mrybas possible to try this? I'm unable to reproduce this issue on my Kubernetes cluster, in my case everything is saved as expected for both Celery and Kubernetes queue when using CeleryKubernetesExecutor.