apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.31k stars 13.79k forks source link

Statsd exporter mappings missing #40027

Open NBardelot opened 1 month ago

NBardelot commented 1 month ago

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Some metrics using tags (file_path, dag_id, task_id essentially) are not corretly mapped in the Helm chart (see chart/files/statsd-mappings.yml). This is probably linked to a feature in Airflow v2.6 that allowed to avoid creating a new metric for each new DAG/task/file, and started to use tags instead, under common metrics.

Yet I've stumbled upon airflow_dag_processing_last_duration having no label in my Prometheus, and found it was not mapped. I've added this as a workaround for the moment:

statsd:
  enabled: true
  ...
  # workaround:
  extraMappings:
    - match: airflow.dag_processing.last_duration.*
      name: "airflow_dag_processing_last_duration"
      labels:
        dag_file: "$1"

What you think should happen instead?

Every metric being logged using tags should be mapped in chart/files/statsd-mappings.yml in order for labels to be applied by the statsd-exporter.

As of Airflow 2.9.1 this is a list of calls to the Stats class that I think are using tags but missing a mapping:

Metric name Unmapped labels
dag_processing.processes dag_file: "$1"
dag_processing.last_duration dag_file: "$1"
dag_processing.processor_timeouts dag_file: "$1"
sla_missed dag_id: "$1", task_id: "$2"
sla_email_notification_failure dag_id: "$1", task_id: "$2"
dag_file_refresh_error dag_file: "$1"
pool.queued_slots pool: "$1"
pool.running_slots pool: "$1"
pool.deferred_slots pool: "$1"
zombies_killed dag_id: "$1", task_id: "$2"
dag.callback_exceptions dag_id: "$1"
task_restored_to_dag dag_id: "$1", task_id: "$2"
task_removed_from_dag dag_id: "$1", task_id: "$2"
task_instance_created dag_id: "$1", task_id: "$2"

Note: as this is a result of a quick grep this list might be incomplete and I might have misunderstood some of the metrics behaviour... The person who wants to provide a fix should not take it for absolute truth...

How to reproduce

Operating System

Kubernetes

Versions of Apache Airflow Providers

The 'statsd' requirements are installed using the official Apache constraints for Python 3.10 and Airflow 2.9.1.

Deployment

Official Apache Airflow Helm Chart

Deployment details

No .Values.statsd.overrideMappings (see chart/templates/configmaps/statsd-configmap.yaml), we use the standard out-of-the-box mappings.

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

jedcunningham commented 1 month ago

If someone picks this up, just keep in mind we can't just add the mapping naively (see this warning).

Maybe we can have a way for users to opt-in to a set of mappings that are kept up to date? But the way this was originally built severely limits what we can do without introducing breaking changes :(