apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.95k stars 14.26k forks source link

The graph should show the median task duration #39937

Open nivdror opened 5 months ago

nivdror commented 5 months ago

Description

Hi, I would like when hovering over a task in the graph UI to be shown the median duration of that task. I seen that from airflow 2.9 you got that data displayed on a different tab.

Use case/motivation

I'm trying to offer my data-engineers a better visibility over their flows. to be able to extract ETA of those flows.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

bbovenzi commented 3 months ago

We show this in the Task Duration tab. And we just expanded the taks duration tab to see across all runs/tasks.

What's the use case here? To see how far off the task duration is from the median? That could be interesting to show.

nivdror commented 3 months ago

Hi @bbovenzi , Thanks for replying.

The use case here is simple. Think of a case when your ETL is delaying for some reason and you got SLA alert. You will need provide an estimate of the tasks that haven't run yet until you are ready to publish your data.

This feature will give the DE oncall with that capability without running into extra hassle (with viewing the previous runs).

I hope I was more clear now

bbovenzi commented 3 months ago

Yes that makes sense. I was thinking that, when no run is selected the Gantt chart could render a summary. It would look like a box+whisker chart to show the median, and min/max durations. We could also then use that data to show in the Dag Run or Task Instance Details views how the individual run/ti stacks up against the median.