bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.83k stars 9.12k forks source link

[bitnami/airflow] Worker logs can't be viewed/reached from the UI #28865

Closed Nickmman closed 3 days ago

Nickmman commented 1 month ago

Name and Version

bitnami/airflow 18.3.8

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Deploy airflow using helm
  2. Connect a git repository to load DAGs from
  3. Execute a DAG
  4. When the run fails, in logs, it'll say "Found logs served from host http://airflow-worker-0.airflow-worker-hl.airflow.svc.cluster.local:8793/log/dag_id=sftp_polling_dag/run_id=scheduled__2023-01-01T00:00:00+00:00/task_id=get_sftp_details/attempt=1.log"

Are you using any custom parameters or values?

global:
  storageClass: ceph-block
auth:
  username: admin
  existingSecret: airflow-auth-bitnami
ingress:
  enabled: true
  hostname: airflow.redacted.com
  ingressClassName: nginx-internal
  annotations:
    cert-manager.io/cluster-issuer: dns-issuer
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/backend-protocol: HTTP
    nginx.ingress.kubernetes.io/ssl-passthrough: "false"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  tls: true
git:
  dags:
    enabled: true
    repositories:
      - repository: 'git@redacted'
        name: git
        branch: dev
  clone:
    extraEnvVars:
      - name: GIT_SSH_COMMAND
        value: "ssh -i /opt/bitnami/.ssh/id_rsa -o StrictHostKeyChecking=no"
    extraVolumeMounts:
      - name: git-ssh-key
        mountPath: /opt/bitnami/.ssh
  sync:
    extraEnvVars:
      - name: GIT_SSH_COMMAND
        value: "ssh -i /opt/bitnami/.ssh/id_rsa -o StrictHostKeyChecking=no"
    extraVolumeMounts:
      - name: git-ssh-key
        mountPath: /opt/bitnami/.ssh
web:
  extraVolumes:
    - name: git-ssh-key
      secret:
        secretName: svc-iac-ssh-key
        defaultMode: 256
worker:
  extraVolumes:
    - name: git-ssh-key
      secret:
        secretName: svc-iac-ssh-key
        defaultMode: 256
postgresql:
  auth:
    existingSecret: airflow-auth-bitnami
redis:
  auth:
    existingSecret: airflow-auth-bitnami
scheduler:
  extraVolumes:
    - name: git-ssh-key
      secret:
        secretName: svc-iac-ssh-key
        defaultMode: 256
  resources:
    requests:
      memory: "512Mi"
      cpu: "1"
    limits:
      memory: "768Mi"
      cpu: "1.5"

What is the expected behavior?

The URL presented by the failed logs should point to an accessible URL via the web pod.

What do you see instead?

URL provided points to an internal service:

airflow-worker-0.airflow-worker-hl.airflow.svc.cluster.local
*** Found logs served from host http://airflow-worker-0.airflow-worker-hl.airflow.svc.cluster.local:8793/log/dag_id=sftp_polling_dag/run_id=scheduled__2023-01-01T00:00:00+00:00/task_id=get_sftp_details/attempt=1.log
[2024-08-12, 21:02:49 UTC] {local_task_job_runner.py:120} ▼ Pre task execution logs
[2024-08-12, 21:02:49 UTC] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: sftp_polling_dag.get_sftp_details scheduled__2023-01-01T00:00:00+00:00 [queued]>
[2024-08-12, 21:02:49 UTC] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: sftp_polling_dag.get_sftp_details scheduled__2023-01-01T00:00:00+00:00 [queued]>
[2024-08-12, 21:02:49 UTC] {taskinstance.py:2306} INFO - Starting attempt 1 of 2
[2024-08-12, 21:02:50 UTC] {taskinstance.py:2330} INFO - Executing <Task(PythonOperator): get_sftp_details> on 2023-01-01 00:00:00+00:00
[2024-08-12, 21:02:50 UTC] {standard_task_runner.py:63} INFO - Started process 517133 to run task
[2024-08-12, 21:02:50 UTC] {standard_task_runner.py:90} INFO - Running: ['airflow', 'tasks', 'run', 'sftp_polling_dag', 'get_sftp_details', 'scheduled__2023-01-01T00:00:00+00:00', '--job-id', '3', '--raw', '--subdir', 'DAGS_FOLDER/git_git/sftp_poll_dag.py', '--cfg-path', '/tmp/tmpk_pojbj2']
[2024-08-12, 21:02:50 UTC] {standard_task_runner.py:91} INFO - Job 3: Subtask get_sftp_details

Additional information

The logs from above are not the logs that the 1.log link contains. It is the link to the 1.log that should be fixed and reachable from within the UI/ingress

Screenshot 2024-08-13 162157

andresbono commented 3 weeks ago

Hi, thank you for creating this issue.

Can you access those logs if you manually update the URL with the accesible endpoint?

Can you also try to set the AIRFLOW_BASE_URL in the worker statefulset? (via worker.extraEnvVars). Not sure if that env-var will be picked up by the airflow-worker component, but it is worth trying.

Nickmman commented 3 weeks ago

Hi @andresbono,

I get a 404 when trying to update the URL with the UI endpoint.

As for the envvar, now I get a 404 in the logs (from within the UI) when accessing a failed task: image

*** Could not read served logs: Client error '404 NOT FOUND' for url 'http://airflow-worker-0.airflow-worker-hl.airflow.svc.cluster.local:8793/log/dag_id=sftp_polling_dag/run_id=manual__2024-08-21T18:11:35.564274+00:00/task_id=get_sftp_details/attempt=1.log'

I set the value to be the same as the web container for that variable, which is http://airflow.valid.dns:8080 (as an example).

I'm testing something else out, so I will report back with my findings next week.

andresbono commented 3 weeks ago

Thank you, keep us posted.

github-actions[bot] commented 5 days ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

Nickmman commented 3 days ago

Hi @andresbono, completely forgot about this.

The logs are now visible from the UI, the issue is that the run needs to be done completely (without any running retires) in order for the logs to then be visible (Post task execution logs):

image

andresbono commented 3 days ago

Thank you very much for sharing your findings, @Nickmman! ❤️ We proceed to close the issue.