apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.49k stars 14.13k forks source link

Duplicate entries in API response when TaskInstanceHistory and TaskInstance have same maximum try number #41765

Open tirkarthi opened 1 month ago

tirkarthi commented 1 month ago

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

While trying out a task with high number of retries I noticed the issue where there are duplicate entries for task tries sometimes but eventually resolves it by itself. I noticed the following query where TaskInstanceHistory and TaskInstance entry is combined. There could be a case where the max try_number of TaskInstanceHistory entries and TaskInstance's try_number are the same thus leading to the duplicate entries in the latest try.

https://github.com/apache/airflow/blob/79db243d03cc4406290597ad400ab0f514975c79/airflow/api_connexion/endpoints/task_instance_endpoint.py#L863-L872

What you think should happen instead?

No response

How to reproduce

  1. Setup a dag with high number of retries.
  2. Notice occassionally the below scenario during API calls with duplicate response for the last try number.

image

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

tirkarthi commented 1 month ago

cc: @ephraimbuddy @bbovenzi

bbovenzi commented 3 weeks ago

Ahh this makes sense. I think when try_number is the same, we should only send the TI entry. and ignore the TIH entry.