apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.26k stars 14.33k forks source link

Stats of a dynamically expended task disappear after an automatic retry of a failed task #44245

Open shahar1 opened 1 day ago

shahar1 commented 1 day ago

Apache Airflow version

2.10.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

At least in the classic UI, when a task gets expanded dynamically, any retried failed task makes its other stats disappear.

What you think should happen instead?

No response

How to reproduce

import random
from datetime import datetime
from airflow import DAG
from airflow.operators.python import PythonOperator

def random_fail_task(task_id):
    if random.random() < 0.5: 
        raise Exception(f"Task {task_id} failed")
    print(f"Task {task_id} succeeded")

def all_success(task_id):
    print(f"Task {task_id} succeeded")

with DAG(
    dag_id='dynamic_task_expansion',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False,
) as dag:

    tasks = PythonOperator.partial(
        task_id='task',
        ## Uncoment below for the second coherence check
        # retries=0,
        python_callable=random_fail_task, # <- Change to all_success for first coherence checks
    ).expand(op_args=[[i] for i in range(100)])

Coherence checks

  1. When replacing random_fail_task with random_all_success_task, we get the exact number of tasks, all succesful (100): image

  2. When we apply retries=0 with random_fail_task the total number of tasks (succesful+failed) is once again 100: image

Reproduction

  1. When running the random_fail_task task with retries>=0 - some of the mapped tasks fail and are up for retry: image

  2. Immediately when failed tasks are triggered for re-run, some of the other tasks disappear from the UI: image

  3. And finally we're left with less tasks than what we started: image

When looking the the "Mapped Tasks" tab, all of the tasks still are still there.

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Google Cloud Composer

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

ashb commented 20 hours ago

First idea: look at the network inspector and see if the issue is in the API response or just in the presentation of the data