epoch8 / airflow-exporter

Airflow plugin to export dag and task based metrics to Prometheus.
Other
240 stars 75 forks source link

Dag and task metrics should be initialized to zero at startup #68

Open prabhuakshai92 opened 4 years ago

prabhuakshai92 commented 4 years ago

Airflow metrics don't get reset after a restart, however, the metrics did not get initialized. This lead to some unexpected PromQL responses when querying with missing data.

For example, a task state 'failed' is set to '1' at the first failure of the task but before the failure no data existed for the task with state 'failed'. A PromQL query that checks if the task at least executed once over a time period using the 'increase' function, based on either 'success' or 'failed' state count increase over that time period, responded as if neither state changed over the period of time because the 'increase' function extrapolates the value that is available over the time period if there is no data.

Prometheus documentation discusses about this issue:

A potential fix for this issue is to initialize all dag and their task metrics to zero at startup.

WakeupTsai commented 4 years ago

A workaround here:

sum(increase(airflow_task_status{status="failed"}[10m])) without (pod,instance) > 0 or max without(pod, instance) (airflow_task_status{status="failed"} != 0 unless airflow_task_status{status="failed"} offset 10m)

reference: https://github.com/prometheus/prometheus/issues/1673

jasonstitt commented 3 years ago

A caveat with the workaround is that the exporter provides a total count of past failures, so when you first start the exporter (or if there's a sufficiently long interruption in metrics), when the exporter comes up everything that failed in the past will show new failures. So, zero initialization would be superior.