LamaAni / KubernetesJobOperator

An airflow operator that executes a task in a kubernetes cluster, given a kubernetes yaml configuration or an image refrence.
58 stars 8 forks source link

Duplicate pod logs #24

Closed odesenfans closed 3 years ago

odesenfans commented 3 years ago

Hello,

I'm running into an issue where pod logs are written twice to the Airflow logs. I run a simple "hello world" command and get the expected output twice in the Airflow logs. The pod logs show only one line as expected. Here is my config and output:

Spec file

apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      imagePullSecrets:
        - name: my-private-registry
      restartPolicy: Never
      containers:
        - name: hello-python
          image: python:3.8.6-slim
          command: ["python", "-c", "print('Hello Python!')"]
          imagePullPolicy: Always

kubectl logs from-image-aaggosqx-9jtkf

Hello Python

Airflow logs

*** Reading local file: /opt/airflow/logs/job-tester/from-image/2020-11-10T10:14:29.942399+00:00/1.log
[2020-11-10 10:14:43,708] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: job-tester.from-image 2020-11-10T10:14:29.942399+00:00 [queued]>
[2020-11-10 10:14:43,795] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: job-tester.from-image 2020-11-10T10:14:29.942399+00:00 [queued]>
[2020-11-10 10:14:43,797] {taskinstance.py:879} INFO - 
--------------------------------------------------------------------------------
[2020-11-10 10:14:43,799] {taskinstance.py:880} INFO - Starting attempt 1 of 1
[2020-11-10 10:14:43,802] {taskinstance.py:881} INFO - 
--------------------------------------------------------------------------------
[2020-11-10 10:14:43,870] {taskinstance.py:900} INFO - Executing <Task(KubernetesJobOperator): from-image> on 2020-11-10T10:14:29.942399+00:00
[2020-11-10 10:14:43,880] {standard_task_runner.py:53} INFO - Started process 376 to run task
[2020-11-10 10:14:44,168] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: job-tester.from-image 2020-11-10T10:14:29.942399+00:00 [running]> 3248fd2ede4d
[2020-11-10 10:14:44,622] {job_runner.py:255} INFO - {job-runner}: Executing context: my-cluster
[2020-11-10 10:14:44,623] {job_runner.py:255} INFO - {job-runner}: Executing cluster: my-cluster
[2020-11-10 10:14:44,626] {job_runner.py:255} INFO - {job-runner}: Started watcher for kinds: pod, service, job, deployment
[2020-11-10 10:14:44,628] {job_runner.py:255} INFO - {job-runner}: Watching namespaces: my-namespace
[2020-11-10 10:14:44,638] {job_runner.py:255} INFO - {job-runner}: Waiting for my-namespace/from-image-aaggosqx to finish...
[2020-11-10 10:14:44,922] {operations.py:62} INFO - [my-namespace/jobs/from-image-aaggosqx] created
[2020-11-10 10:14:44,940] {watchers.py:276} INFO - [my-namespace/pods/from-image-aaggosqx-9jtkf] Pending
[2020-11-10 10:14:45,085] {watchers.py:276} INFO - [my-namespace/jobs/from-image-aaggosqx] Pending
[2020-11-10 10:14:45,088] {watchers.py:276} INFO - [my-namespace/jobs/from-image-aaggosqx] Running
[2020-11-10 10:14:49,576] {watchers.py:276} INFO - [my-namespace/pods/from-image-aaggosqx-9jtkf] Succeeded
[2020-11-10 10:14:49,600] {watchers.py:276} INFO - [my-namespace/jobs/from-image-aaggosqx] Succeeded
[2020-11-10 10:14:49,605] {job_runner.py:255} INFO - {job-runner}: Job Succeeded
[2020-11-10 10:14:49,606] {job_runner.py:255} INFO - {job-runner}: Deleting resources due to policy: IfSucceeded
[2020-11-10 10:14:49,608] {job_runner.py:255} INFO - {job-runner}: Deleting job..
[2020-11-10 10:14:49,609] {job_runner.py:255} INFO - {job-runner}: Deleting objects: my-namespace/jobs/from-image-aaggosqx
[2020-11-10 10:14:49,770] {queries.py:57} INFO - [my-namespace/pods/from-image-aaggosqx-9jtkf]: Hello Python
[2020-11-10 10:14:49,856] {queries.py:57} INFO - [my-namespace/pods/from-image-aaggosqx-9jtkf]: Hello Python
[2020-11-10 10:14:49,858] {operations.py:62} INFO - [my-namespace/jobs/from-image-aaggosqx] deleted
[2020-11-10 10:14:49,861] {job_runner.py:255} INFO - {job-runner}: Job deleted
[2020-11-10 10:14:49,863] {job_runner.py:255} INFO - {job-runner}: Client stopped, execution completed.
[2020-11-10 10:14:49,903] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=job-tester, task_id=from-image, execution_date=20201110T101429, start_date=20201110T101443, end_date=20201110T101449
[2020-11-10 10:14:53,589] {logging_mixin.py:112} INFO - [2020-11-10 10:14:53,588] {local_task_job.py:103} INFO - Task exited with return code 0

This feels like an issue where two watchers would be attached instead of one, but I'm not sure. Any idea?

LamaAni commented 3 years ago

Hi, I'll grab a look at that. Might take a week. If you find it please PR in.

LamaAni commented 3 years ago

Reproduced.

LamaAni commented 3 years ago

Please grab a look at the PR here, and tell me if I missed anything there. I'll commit otherwise. https://github.com/LamaAni/KubernetesJobOperator/pull/25

Critical line is here: https://github.com/LamaAni/KubernetesJobOperator/pull/25/files#diff-1f41904b11937c04eeab725d2bee7936ded239daf44d68dbf67d514ea5ec0a05

odesenfans commented 3 years ago

Tested, the issue is fixed in 1.0.12!