apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.56k stars 14.16k forks source link

Tasks get killed externally #33155

Open potiuk opened 1 year ago

potiuk commented 1 year ago

Discussed in https://github.com/apache/airflow/discussions/33113

Originally posted by **karakanb** August 4, 2023 Our SnowflakeOperator tasks get killed due to SIGKILL in some occasions for some reason, and after they get killed their retry starts naturally. However, these retries later on get killed with the following log: ``` [2023-08-04, 05:02:35 +03] {local_task_job_runner.py:291} WARNING - State of this instance has been externally set to up_for_retry. Terminating instance. [2023-08-04, 05:02:35 +03] {process_utils.py:131} INFO - Sending Signals.SIGTERM to group 8843. PIDs of all processes in the group: [8843] [2023-08-04, 05:02:35 +03] {process_utils.py:86} INFO - Sending the signal Signals.SIGTERM to group 8843 [2023-08-04, 05:02:35 +03] {taskinstance.py:1517} ERROR - Received SIGTERM. Terminating subprocesses. ``` This is happening quite regularly, happened in v2.6.1 and is still happening in v2.6.3. I tried to dive deeper into the logs across all Airflow components and noticed these weird logs: ``` [2023-08-04T02:02:38.505+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team2-dag1', task_id='cl-events', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=2, map_index=-1) [2023-08-04T02:02:38.518+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team2-dag1, task_id=cl-events, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:20.250391+00:00, run_end_date=2023-08-04 02:02:36.543903+00:00, run_duration=16.293512, state=failed, executor_state=success, try_number=2, max_tries=2, job_id=35737, pool=default_pool, queue=default, priority_weight=40, operator=SnowflakeOperator, queued_dttm=2023-08-04 02:02:12.556393+00:00, queued_by_job_id=35370, pid=8843 ``` One of the logs says "received executor event with state success" and the other one has the state as "failed". There are no logs or references for the task being marked as `up_for_retry` anywhere.
All the logs from all components under here, click this The following are all the anonymyzed logs across all Airflow components during that time period: ``` [2023-08-04T02:02:18.403+0000] {scheduler_job_runner.py:411} INFO - 2 tasks up for execution: [2023-08-04T02:02:18.403+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 10/12 running and queued tasks [2023-08-04T02:02:18.404+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 11/12 running and queued tasks [2023-08-04T02:02:18.404+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:18.408+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='task-mapping-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 17 and queue default [2023-08-04T02:02:18.409+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-mapping-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:18.410+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='task-economy', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 5 and queue default [2023-08-04T02:02:18.410+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-economy', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:18.474+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:18.482+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:18.481+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:18.484+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04 02:02:18,544: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[60af54af-d9b0-4274-95ac-d83176cdec45] received [2023-08-04 02:02:18,546: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[6d0d2a82-cf6c-414a-b587-ab17d4821b44] received [2023-08-04 02:02:18,555: INFO/ForkPoolWorker-4] [60af54af-d9b0-4274-95ac-d83176cdec45] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-mapping-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:18,564: INFO/ForkPoolWorker-8] [6d0d2a82-cf6c-414a-b587-ab17d4821b44] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-economy', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:18,824: INFO/ForkPoolWorker-3] Running on host airflow-worker-6575d6f488-2k88x [2023-08-04T02:02:18.835+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-mapping-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:18.836+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-economy', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:18.857+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 6d0d2a82-cf6c-414a-b587-ab17d4821b44 [2023-08-04T02:02:18.858+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 60af54af-d9b0-4274-95ac-d83176cdec45 [2023-08-04 02:02:19,890: INFO/ForkPoolWorker-3] Running on host airflow-worker-6575d6f488-bwr8k [2023-08-04 02:02:20,666: INFO/ForkPoolWorker-2] Task airflow.executors.celery_executor.execute_command[9a104eeb-5785-4e52-83dc-2cd5782bb36e] succeeded in 20.86061717895791s: None [2023-08-04 02:02:20,732: INFO/ForkPoolWorker-4] Running on host airflow-worker-6575d6f488-2k88x [2023-08-04 02:02:21,534: INFO/ForkPoolWorker-8] Task airflow.executors.celery_executor.execute_command[21fe395a-8ad0-4d74-9b64-6a4c9d0cb546] succeeded in 22.492166528012604s: None [2023-08-04T02:02:21.904+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-main-purchases', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:21.904+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag3', task_id='tasks-main-users', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:21.923+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-main-purchases, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:13.952356+00:00, run_end_date=2023-08-04 02:02:21.159794+00:00, run_duration=7.207438, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35730, pool=default_pool, queue=default, priority_weight=7, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:01:58.924308+00:00, queued_by_job_id=35370, pid=8811 [2023-08-04T02:02:21.923+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag3, task_id=tasks-main-users, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:14.836900+00:00, run_end_date=2023-08-04 02:02:20.218256+00:00, run_duration=5.381356, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35734, pool=default_pool, queue=default, priority_weight=3, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:01:59.791256+00:00, queued_by_job_id=35370, pid=8779 [2023-08-04T02:02:22.338+0000] {scheduler_job_runner.py:411} INFO - 2 tasks up for execution: [2023-08-04T02:02:22.339+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 9/12 running and queued tasks [2023-08-04T02:02:22.339+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 10/12 running and queued tasks [2023-08-04T02:02:22.339+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:22.345+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='task-main-cup', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 5 and queue default [2023-08-04T02:02:22.345+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-main-cup', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:22.346+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='main-cas-task', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 5 and queue default [2023-08-04T02:02:22.346+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-cas-task', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:22,362: INFO/ForkPoolWorker-1] Task airflow.executors.celery_executor.execute_command[5693d97d-ca45-438c-847e-dd9d5a8729d4] succeeded in 15.857558179879561s: None [2023-08-04T02:02:22.402+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:22.405+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:22.404+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:22.407+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04 02:02:22,426: INFO/ForkPoolWorker-2] Task airflow.executors.celery_executor.execute_command[1b5f31c3-d2dd-472c-ab2b-2d825c206378] succeeded in 23.37721593794413s: None [2023-08-04 02:02:22,478: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[b0e81a3c-422c-4ffe-8cb0-b54df5c268ae] received [2023-08-04 02:02:22,528: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[8f42359c-4cde-478a-8d4c-977811514dc0] received [2023-08-04 02:02:22,547: INFO/ForkPoolWorker-1] [8f42359c-4cde-478a-8d4c-977811514dc0] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-main-cup', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:22,613: INFO/ForkPoolWorker-5] [b0e81a3c-422c-4ffe-8cb0-b54df5c268ae] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-cas-task', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:22.805+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-main-cup', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:22.805+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-cas-task', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:22.805+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-idfa-special', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:22.806+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-popup-cas', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:22.822+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to b0e81a3c-422c-4ffe-8cb0-b54df5c268ae [2023-08-04T02:02:22.822+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-popup-cas, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:14.416592+00:00, run_end_date=2023-08-04 02:02:21.934650+00:00, run_duration=7.518058, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35732, pool=default_pool, queue=default, priority_weight=6, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:06.479577+00:00, queued_by_job_id=35370, pid=8839 [2023-08-04T02:02:22.823+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 8f42359c-4cde-478a-8d4c-977811514dc0 [2023-08-04T02:02:22.823+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=main-idfa-special, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:14.083753+00:00, run_end_date=2023-08-04 02:02:22.009704+00:00, run_duration=7.925951, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35731, pool=default_pool, queue=default, priority_weight=6, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:01:58.924308+00:00, queued_by_job_id=35370, pid=8805 [2023-08-04 02:02:22,931: INFO/ForkPoolWorker-1] Task airflow.executors.celery_executor.execute_command[f58b9b21-4835-4078-8c7e-818257d15c85] succeeded in 23.897192854899913s: None [2023-08-04T02:02:23.145+0000] {scheduler_job_runner.py:411} INFO - 2 tasks up for execution: [2023-08-04T02:02:23.145+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 10/12 running and queued tasks [2023-08-04T02:02:23.146+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 11/12 running and queued tasks [2023-08-04T02:02:23.146+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:23.151+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='idfa-lev-start', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 5 and queue default [2023-08-04T02:02:23.151+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'idfa-lev-start', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:23.152+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='main-sec-track', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 4 and queue default [2023-08-04T02:02:23.152+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-sec-track', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:23.196+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:23.199+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:23.201+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:23.204+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04 02:02:23,272: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[c6766690-9ab2-4fdd-93d0-11fa92f1e4e2] received [2023-08-04 02:02:23,285: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[b0f2ad65-855c-4c29-acfa-6d22cc4b5311] received [2023-08-04 02:02:23,298: INFO/ForkPoolWorker-4] [b0f2ad65-855c-4c29-acfa-6d22cc4b5311] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-sec-track', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:23,415: INFO/ForkPoolWorker-6] [c6766690-9ab2-4fdd-93d0-11fa92f1e4e2] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'idfa-lev-start', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:23.477+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='idfa-lev-start', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:23.477+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-sec-track', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:23.478+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='idfa_model-level_end.main.idfa_model.level_end', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:23.492+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to b0f2ad65-855c-4c29-acfa-6d22cc4b5311 [2023-08-04T02:02:23.493+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=idfa_model-level_end.main.idfa_model.level_end, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:14.388300+00:00, run_end_date=2023-08-04 02:02:22.553085+00:00, run_duration=8.164785, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35733, pool=default_pool, queue=default, priority_weight=10, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:01:58.924308+00:00, queued_by_job_id=35370, pid=8778 [2023-08-04T02:02:23.493+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to c6766690-9ab2-4fdd-93d0-11fa92f1e4e2 [2023-08-04T02:02:23.830+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:23.831+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 12/12 running and queued tasks [2023-08-04T02:02:23.832+0000] {scheduler_job_runner.py:483} INFO - Not executing since the number of tasks running or queued from DAG team1-dag1 is >= to the DAG's max_active_tasks limit of 12 [2023-08-04T02:02:23.833+0000] {scheduler_job_runner.py:573} INFO - Found no task instances to queue on query iteration 1 but there could be more candidate task instances to check. [2023-08-04T02:02:23.854+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:23.855+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag2 has 0/12 running and queued tasks [2023-08-04T02:02:23.855+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:23.861+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag2', task_id='task1', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:23.861+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag2', 'task1', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:23,873: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[057609bf-3d87-47e6-912a-589599c62dc4] received [2023-08-04 02:02:23,904: INFO/ForkPoolWorker-1] [057609bf-3d87-47e6-912a-589599c62dc4] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag2', 'task1', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:24.019+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag2', task_id='task1', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:24.032+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 057609bf-3d87-47e6-912a-589599c62dc4 [2023-08-04 02:02:24,343: INFO/ForkPoolWorker-8] Running on host airflow-worker-6575d6f488-g5vjh [2023-08-04 02:02:24,670: INFO/ForkPoolWorker-2] Task airflow.executors.celery_executor.execute_command[ff8c5f9f-0879-4715-a6b8-f77a611a545b] succeeded in 15.172818467020988s: None [2023-08-04T02:02:25.069+0000] {triggerer_job_runner.py:483} INFO - 0 triggers currently running [2023-08-04 02:02:25,493: INFO/ForkPoolWorker-3] Task airflow.executors.celery_executor.execute_command[2e1f4026-04ca-4aa9-baae-2a0de2c378bf] succeeded in 18.478193846996874s: None [2023-08-04 02:02:25,765: INFO/ForkPoolWorker-5] Running on host airflow-worker-6575d6f488-2k88x [2023-08-04T02:02:25.778+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-rating', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:25.778+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='idfa_model-ad_tracking.main.idfa_model.ad_tracking', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:25.792+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-rating, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:19.064954+00:00, run_end_date=2023-08-04 02:02:25.145527+00:00, run_duration=6.080573, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35736, pool=default_pool, queue=default, priority_weight=5, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:06.999791+00:00, queued_by_job_id=35370, pid=8806 [2023-08-04T02:02:25.793+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=idfa_model-ad_tracking.main.idfa_model.ad_tracking, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:17.020508+00:00, run_end_date=2023-08-04 02:02:24.306405+00:00, run_duration=7.285897, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35735, pool=default_pool, queue=default, priority_weight=5, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:09.479569+00:00, queued_by_job_id=35370, pid=8840 [2023-08-04T02:02:27.210+0000] {scheduler_job_runner.py:411} INFO - 2 tasks up for execution: [2023-08-04T02:02:27.211+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 10/12 running and queued tasks [2023-08-04T02:02:27.211+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 11/12 running and queued tasks [2023-08-04T02:02:27.212+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:27.216+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='main-eng-time', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 4 and queue default [2023-08-04T02:02:27.217+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-eng-time', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:27.217+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='main-gift', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 4 and queue default [2023-08-04T02:02:27.217+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-gift', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:27.268+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:27.271+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:27.279+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:27.283+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04 02:02:27,324: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[53776bd1-8696-4191-b9fe-3ef941dbb786] received [2023-08-04 02:02:27,337: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[d2f5f9b6-3690-418b-b3e4-dac9fe23d77f] received [2023-08-04 02:02:27,349: INFO/ForkPoolWorker-2] [d2f5f9b6-3690-418b-b3e4-dac9fe23d77f] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-gift', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:27,351: INFO/ForkPoolWorker-2] [53776bd1-8696-4191-b9fe-3ef941dbb786] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-eng-time', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:27,353: INFO/ForkPoolWorker-2] Running on host airflow-worker-6575d6f488-g5vjh [2023-08-04 02:02:27,470: INFO/ForkPoolWorker-8] Running on host airflow-worker-6575d6f488-bwr8k [2023-08-04T02:02:27.578+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-eng-time', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:27.578+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-gift', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:27.604+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to d2f5f9b6-3690-418b-b3e4-dac9fe23d77f [2023-08-04T02:02:27.604+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 53776bd1-8696-4191-b9fe-3ef941dbb786 [2023-08-04 02:02:28,059: INFO/ForkPoolWorker-1] Task airflow.executors.celery_executor.execute_command[bff5c1a3-8c75-48b1-a53e-7ed39cdf8064] succeeded in 76.61487208399922s: None [2023-08-04 02:02:28,501: INFO/ForkPoolWorker-4] Task airflow.executors.celery_executor.execute_command[9f6ef560-6cd1-4531-854e-0fadfec3140e] succeeded in 19.514282511081547s: None [2023-08-04T02:02:28.896+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team2-dag2', task_id='seat_jam-bi-main_install_date_based_performance_metrics.main.seat_jam.bi.main_install_date_based_performance_metrics', run_id='scheduled__2023-08-03T01:00:00+00:00', try_number=2, map_index=-1) [2023-08-04T02:02:28.897+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='idfa_model-sticker_earned.main.idfa_model.sticker_earned', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:28.920+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team2-dag2, task_id=seat_jam-bi-main_install_date_based_performance_metrics.main.seat_jam.bi.main_install_date_based_performance_metrics, run_id=scheduled__2023-08-03T01:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:01:22.917457+00:00, run_end_date=None, run_duration=59.872644, state=running, executor_state=success, try_number=2, max_tries=2, job_id=35697, pool=default_pool, queue=default, priority_weight=3, operator=SnowflakeOperator, queued_dttm=2023-08-04 02:01:11.427581+00:00, queued_by_job_id=35370, pid=8852 [2023-08-04T02:02:28.921+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=idfa_model-sticker_earned.main.idfa_model.sticker_earned, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:20.981821+00:00, run_end_date=2023-08-04 02:02:28.186293+00:00, run_duration=7.204472, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35738, pool=default_pool, queue=default, priority_weight=6, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:08.973938+00:00, queued_by_job_id=35370, pid=8816 [2023-08-04 02:02:29,278: INFO/ForkPoolWorker-4] Running on host airflow-worker-6575d6f488-g5vjh [2023-08-04 02:02:29,433: INFO/ForkPoolWorker-8] Task airflow.executors.celery_executor.execute_command[c834a862-64e8-4e42-b67c-666420f71724] succeeded in 15.717544401995838s: None [2023-08-04 02:02:30,105: INFO/ForkPoolWorker-3] Running on host airflow-worker-6575d6f488-qbjcn [2023-08-04T02:02:30.324+0000] {scheduler_job_runner.py:411} INFO - 2 tasks up for execution: [2023-08-04T02:02:30.325+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 11/12 running and queued tasks [2023-08-04T02:02:30.325+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag2 has 1/12 running and queued tasks [2023-08-04T02:02:30.325+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:30.339+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='main-rate-us', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 4 and queue default [2023-08-04T02:02:30.340+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-rate-us', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:30.340+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag2', task_id='task-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:30.341+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:30.396+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:30.399+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:30.403+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04T02:02:30.406+0000] {connection.py:268} WARNING - Secure redis scheme specified (rediss) with no ssl options, defaulting to insecure SSL behaviour. [2023-08-04 02:02:30,469: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[55a393a0-72ff-4adc-bee7-471591aa6981] received [2023-08-04 02:02:30,470: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[6224a09d-6706-4e7d-9ea0-7a056ac84069] received [2023-08-04 02:02:30,479: INFO/ForkPoolWorker-8] [6224a09d-6706-4e7d-9ea0-7a056ac84069] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:30,479: INFO/ForkPoolWorker-1] [55a393a0-72ff-4adc-bee7-471591aa6981] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'main-rate-us', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:30.766+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-rate-us', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:30.766+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag2', task_id='task-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:30.767+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag3', task_id='task2', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:30.789+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag3, task_id=task2, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:24.550286+00:00, run_end_date=2023-08-04 02:02:29.222683+00:00, run_duration=4.672397, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35739, pool=default_pool, queue=default, priority_weight=3, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:13.683794+00:00, queued_by_job_id=35370, pid=8891 [2023-08-04T02:02:30.790+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 6224a09d-6706-4e7d-9ea0-7a056ac84069 [2023-08-04T02:02:30.790+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 55a393a0-72ff-4adc-bee7-471591aa6981 [2023-08-04 02:02:31,052: INFO/ForkPoolWorker-1] Running on host airflow-worker-6575d6f488-bwr8k [2023-08-04T02:02:31.175+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:31.175+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag2 has 2/12 running and queued tasks [2023-08-04T02:02:31.176+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:31.184+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag2', task_id='task-123123', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:31.185+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-123123', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:31,193: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[9fa1f97d-f64b-41d2-bcd7-4d1742d8c402] received [2023-08-04 02:02:31,203: INFO/ForkPoolWorker-8] [9fa1f97d-f64b-41d2-bcd7-4d1742d8c402] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-123123', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:31.377+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag2', task_id='task-123123', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:31.387+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 9fa1f97d-f64b-41d2-bcd7-4d1742d8c402 [2023-08-04 02:02:31,941: INFO/ForkPoolWorker-4] Running on host airflow-worker-6575d6f488-bwr8k [2023-08-04 02:02:32,473: INFO/ForkPoolWorker-5] Task airflow.executors.celery_executor.execute_command[28f424f0-bd9c-4aa4-8cf9-14021291fe82] succeeded in 19.384600382996723s: None [2023-08-04T02:02:33.096+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-abte', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:33.118+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=main-abte, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:25.966527+00:00, run_end_date=2023-08-04 02:02:32.163021+00:00, run_duration=6.196494, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35740, pool=default_pool, queue=default, priority_weight=7, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:13.077299+00:00, queued_by_job_id=35370, pid=8817 [2023-08-04T02:02:33.130+0000] {scheduler_job_runner.py:1688} WARNING - Failing (1) jobs without heartbeat after 2023-08-04 01:57:33.120520+00:00 [2023-08-04T02:02:33.130+0000] {scheduler_job_runner.py:1698} ERROR - Detected zombie job: {'full_filepath': '/opt/airflow/dags/repo/mydag.py', 'processor_subdir': '/opt/airflow/dags/repo/', 'msg': "{'DAG Id': 'team2-dag2', 'Task Id': 'seat_jam-bi-main_install_date_based_performance_metrics.main.seat_jam.bi.main_install_date_based_performance_metrics', 'Run Id': 'scheduled__2023-08-03T01:00:00+00:00', 'Hostname': 'airflow-worker-6575d6f488-g5vjh', 'External Executor Id': 'bff5c1a3-8c75-48b1-a53e-7ed39cdf8064'}", 'simple_task_instance': , 'is_failure_callback': True} [2023-08-04 02:02:33,382: INFO/ForkPoolWorker-5] Running on host airflow-worker-6575d6f488-g5vjh [2023-08-04T02:02:34.307+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:34.307+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag2 has 3/12 running and queued tasks [2023-08-04T02:02:34.308+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:34.313+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag2', task_id='python-task', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:34.313+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag2', 'python-task', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:34,322: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[f854d40b-5969-4db7-ba15-d55e61b28a89] received [2023-08-04 02:02:34,335: INFO/ForkPoolWorker-8] [f854d40b-5969-4db7-ba15-d55e61b28a89] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag2', 'python-task', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:34.458+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag2', task_id='python-task', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:34.470+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to f854d40b-5969-4db7-ba15-d55e61b28a89 [2023-08-04 02:02:34,682: INFO/ForkPoolWorker-8] Task airflow.executors.celery_executor.execute_command[6d0d2a82-cf6c-414a-b587-ab17d4821b44] succeeded in 16.129581549903378s: None [2023-08-04 02:02:34,777: INFO/ForkPoolWorker-6] Running on host airflow-worker-6575d6f488-g5vjh [2023-08-04T02:02:34.797+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-economy', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:34.827+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-economy, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:27.797280+00:00, run_end_date=2023-08-04 02:02:34.016161+00:00, run_duration=6.218881, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35742, pool=default_pool, queue=default, priority_weight=5, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:18.405670+00:00, queued_by_job_id=35370, pid=8852 [2023-08-04 02:02:35,018: INFO/ForkPoolWorker-2] Task airflow.executors.celery_executor.execute_command[29841fa6-bb44-467c-98cd-f297ff7a6b4d] succeeded in 19.205872046994045s: None [2023-08-04T02:02:36.122+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:36.122+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag2 has 4/12 running and queued tasks [2023-08-04T02:02:36.123+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:36.127+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag2', task_id='task-adid-mapping-uid', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:36.127+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-adid-mapping-uid', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:36,136: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[de42987d-980c-4233-af72-59df32247d2f] received [2023-08-04 02:02:36,166: INFO/ForkPoolWorker-1] [de42987d-980c-4233-af72-59df32247d2f] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag2', 'task-adid-mapping-uid', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:36.259+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag2', task_id='task-adid-mapping-uid', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:36.259+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='idfa-medal', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:36.275+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to de42987d-980c-4233-af72-59df32247d2f [2023-08-04T02:02:36.275+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=idfa-medal, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:27.604100+00:00, run_end_date=2023-08-04 02:02:34.756468+00:00, run_duration=7.152368, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35741, pool=default_pool, queue=default, priority_weight=5, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:15.794386+00:00, queued_by_job_id=35370, pid=8892 [2023-08-04 02:02:36,533: INFO/ForkPoolWorker-1] Running on host airflow-worker-6575d6f488-qbjcn [2023-08-04T02:02:36.607+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:36.607+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 9/12 running and queued tasks [2023-08-04T02:02:36.607+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:36.611+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='task-mnbv', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:36.612+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-mnbv', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:36,621: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[9c4506ac-e48c-4e62-a06a-6f8fd6f8dd53] received [2023-08-04 02:02:36,637: INFO/ForkPoolWorker-2] [9c4506ac-e48c-4e62-a06a-6f8fd6f8dd53] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-mnbv', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:36.753+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-mnbv', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:36.765+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 9c4506ac-e48c-4e62-a06a-6f8fd6f8dd53 [2023-08-04 02:02:37,214: INFO/ForkPoolWorker-2] Running on host airflow-worker-6575d6f488-bwr8k [2023-08-04 02:02:37,294: INFO/ForkPoolWorker-3] Task airflow.executors.celery_executor.execute_command[681c8f80-273a-4810-be1c-17832c97a256] succeeded in 24.719424202106893s: None [2023-08-04T02:02:38.505+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team2-dag1', task_id='cl-events', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=2, map_index=-1) [2023-08-04T02:02:38.518+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team2-dag1, task_id=cl-events, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:20.250391+00:00, run_end_date=2023-08-04 02:02:36.543903+00:00, run_duration=16.293512, state=failed, executor_state=success, try_number=2, max_tries=2, job_id=35737, pool=default_pool, queue=default, priority_weight=40, operator=SnowflakeOperator, queued_dttm=2023-08-04 02:02:12.556393+00:00, queued_by_job_id=35370, pid=8843 [2023-08-04 02:02:38,897: INFO/ForkPoolWorker-4] Task airflow.executors.celery_executor.execute_command[b0f2ad65-855c-4c29-acfa-6d22cc4b5311] succeeded in 15.60874836705625s: None [2023-08-04 02:02:38,997: INFO/ForkPoolWorker-4] Task airflow.executors.celery_executor.execute_command[60af54af-d9b0-4274-95ac-d83176cdec45] succeeded in 20.451114857103676s: None [2023-08-04T02:02:39.725+0000] {scheduler_job_runner.py:411} INFO - 1 tasks up for execution: [2023-08-04T02:02:39.726+0000] {scheduler_job_runner.py:476} INFO - DAG team1-dag1 has 7/12 running and queued tasks [2023-08-04T02:02:39.726+0000] {scheduler_job_runner.py:587} INFO - Setting the following tasks to queued state: [2023-08-04T02:02:39.731+0000] {scheduler_job_runner.py:625} INFO - Sending TaskInstanceKey(dag_id='team1-dag1', task_id='task-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) to executor with priority 3 and queue default [2023-08-04T02:02:39.731+0000] {base_executor.py:147} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04 02:02:39,725: INFO/ForkPoolWorker-1] Task airflow.executors.celery_executor.execute_command[8f42359c-4cde-478a-8d4c-977811514dc0] succeeded in 17.195349980844185s: None [2023-08-04 02:02:39,739: INFO/MainProcess] Task airflow.executors.celery_executor.execute_command[83d9a92a-6b60-4876-80e0-016981438778] received [2023-08-04 02:02:39,773: INFO/ForkPoolWorker-1] [83d9a92a-6b60-4876-80e0-016981438778] Executing command in Celery: ['airflow', 'tasks', 'run', 'team1-dag1', 'task-xyz', 'scheduled__2023-08-03T02:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/mydag.py'] [2023-08-04T02:02:39.861+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state queued for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:39.861+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-mapping-xyz', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:39.861+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='task-main-cup', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:39.862+0000] {scheduler_job_runner.py:677} INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='team1-dag1', task_id='main-sec-track', run_id='scheduled__2023-08-03T02:00:00+00:00', try_number=1, map_index=-1) [2023-08-04T02:02:39.876+0000] {scheduler_job_runner.py:703} INFO - Setting external_id for to 83d9a92a-6b60-4876-80e0-016981438778 [2023-08-04T02:02:39.876+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=main-sec-track, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:32.334127+00:00, run_end_date=2023-08-04 02:02:38.447800+00:00, run_duration=6.113673, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35746, pool=default_pool, queue=default, priority_weight=4, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:23.147706+00:00, queued_by_job_id=35370, pid=8855 [2023-08-04T02:02:39.877+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-main-cup, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:31.428039+00:00, run_end_date=2023-08-04 02:02:39.374084+00:00, run_duration=7.946045, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35745, pool=default_pool, queue=default, priority_weight=5, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:22.340814+00:00, queued_by_job_id=35370, pid=8854 [2023-08-04T02:02:39.878+0000] {scheduler_job_runner.py:713} INFO - TaskInstance Finished: dag_id=team1-dag1, task_id=task-mapping-xyz, run_id=scheduled__2023-08-03T02:00:00+00:00, map_index=-1, run_start_date=2023-08-04 02:02:29.499923+00:00, run_end_date=2023-08-04 02:02:38.705643+00:00, run_duration=9.20572, state=success, executor_state=success, try_number=1, max_tries=2, job_id=35743, pool=default_pool, queue=default, priority_weight=17, operator=SafeBigQueryExecuteQueryOperator, queued_dttm=2023-08-04 02:02:18.405670+00:00, queued_by_job_id=35370, pid=8893 ```
### Airflow `log` table #### First example: `cl-events` task In order to have a better understanding, I checked the `log` table in the database for this specific instance and noticed an interesting timeline: |dttm |task_id |event |execution_date | |---------------------------------|---------|-------|---------------------------------| |2023-08-04 06:15:28.132585 +00:00|cl-events|success|2023-08-03 02:00:00.000000 +00:00| |2023-08-04 06:14:21.615933 +00:00|cl-events|running|2023-08-03 02:00:00.000000 +00:00| |2023-08-04 02:02:26.216243 +00:00|cl-events|failed |2023-08-03 02:00:00.000000 +00:00| |2023-08-04 02:02:20.279347 +00:00|cl-events|running|2023-08-03 02:00:00.000000 +00:00| |2023-08-04 02:01:41.164003 +00:00|cl-events|failed |2023-08-03 02:00:00.000000 +00:00| |2023-08-04 02:01:11.900716 +00:00|cl-events|running|2023-08-03 02:00:00.000000 +00:00| Some important remarks here are: - The first retry is marked as failed exactly 30 seconds after it started; however, in the try logs it happens at `02:01:28` intead, so there's some delay between the task failing vs the log being put in this table. - The second retry is marked as `failed` here at `02:02:26`; however, at that point the task was running already, it received the SIGTERM at `02:02:35`. #### Second example: `task2` The same issue occurred on August 3rd as well but for another task, and here are the `log` table entries for that task, let's call it `task2`: |dttm |task_id |event |execution_date | |---------------------------------|---------|-------|---------------------------------| |2023-08-03 02:01:52.012336 +00:00|task2 |failed |2023-08-02 01:00:00.000000 +00:00| |2023-08-03 02:01:43.497245 +00:00|task2 |running|2023-08-02 01:00:00.000000 +00:00| |2023-08-03 02:01:04.982955 +00:00|task2 |failed |2023-08-02 01:00:00.000000 +00:00| |2023-08-03 01:57:51.946012 +00:00|task2 |running|2023-08-02 01:00:00.000000 +00:00| - Contrary to the first example, this one doesn't have a special timing difference between the start and the end of the first failure, 193 seconds. - The second attempt starts at `02:01:43` and is marked as failed at `02:01:52`, which is 9 seconds apart, which is an odd number; and according to the task logs it received the SIGTERM at `02:01:59`, which is 7 seconds after the task is marked as failed. - Same as the first example, this task's first retry also failed with a SIGKILL. #### Third example: `task3` |dttm |task_id |event |execution_date | |---------------------------------|---------|-------|---------------------------------| |2023-08-04 07:07:26.686553 +00:00|task3 |success|2023-08-02 02:00:00.000000 +00:00| |2023-08-04 07:07:16.212075 +00:00|task3 |running|2023-08-02 02:00:00.000000 +00:00| |2023-08-03 02:01:52.089021 +00:00|task3 |failed |2023-08-02 02:00:00.000000 +00:00| |2023-08-03 02:01:48.154045 +00:00|task3 |running|2023-08-02 02:00:00.000000 +00:00| |2023-08-03 02:01:05.033779 +00:00|task3 |failed |2023-08-02 02:00:00.000000 +00:00| |2023-08-03 02:00:39.493838 +00:00|task3 |running|2023-08-02 02:00:00.000000 +00:00| - In this task, the first retry failed with a SIGKILL as well. - The second attempt for marked as failed only 4 seconds after it started. - According to task logs, the task got the SIGTERM at `02:01:58`, so 6 seconds after it got marked as failed. To be honest, I fail to see a pattern here yet. ## What should happen instead? The retried task should finish properly instead of receiving a SIGTERM. ## Versions Airflow: v2.6.3, the issue happened in v2.6.1 as well, I skipped the v2.6.2 Providers: ``` apache-airflow-providers-amazon==8.1.0 apache-airflow-providers-celery==3.2.1 apache-airflow-providers-cncf-kubernetes==7.4.0 apache-airflow-providers-common-sql==1.6.1 apache-airflow-providers-discord==3.2.0 apache-airflow-providers-docker==3.7.1 apache-airflow-providers-elasticsearch==4.5.1 apache-airflow-providers-ftp==3.4.2 apache-airflow-providers-google==10.5.0 apache-airflow-providers-grpc==3.2.1 apache-airflow-providers-hashicorp==3.4.1 apache-airflow-providers-http==4.4.2 apache-airflow-providers-imap==3.2.2 apache-airflow-providers-microsoft-azure==6.1.2 apache-airflow-providers-mysql==5.1.1 apache-airflow-providers-odbc==4.0.0 apache-airflow-providers-postgres==5.5.1 apache-airflow-providers-redis==3.2.1 apache-airflow-providers-sendgrid==3.2.1 apache-airflow-providers-sftp==4.3.1 apache-airflow-providers-slack==7.3.2 apache-airflow-providers-snowflake==4.4.0 apache-airflow-providers-sqlite==3.4.2 apache-airflow-providers-ssh==3.7.1 apache-airflow-providers-tableau==4.2.0 ``` I am running the cluster on DigitalOcean managed k8s with the official helm chart v1.9.0. ---- I am happy to share more details or if anyone has any ideas I can try to reproduce this with. I simply have no clue what might be happening, any help is appreciated.
sunank200 commented 1 year ago

@potiuk let me try to reproduce this locally.

Taragolis commented 1 year ago

Just my assumption, according to past experience and maybe it is not relevant nowadays, snowflake-connector-python might use quite a few memory, so if a lot of task run simultaneously OOM could occurred if worker doesn't have enough memory