apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.15k stars 14.32k forks source link

Statsd metric name components are unsanitized #29268

Closed waldoppper closed 8 months ago

waldoppper commented 1 year ago

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

[2.2.5] I recently enabled statsd metric emission on my team's deploy and began writing views on the result.

I've found that:

  1. some dag authors had included a period in their task_id name - like task_id = "load.all"
  2. this string is not sanitized of statsd-sensitive characters (like '.' and '|') prior to the building of the Metric name

Result: metric names with too many .-delimiters and grafana views which don't render these task metrics.

What you think should happen instead

Any instance of Statsd.foo(f'{dag_id}...') or Statsd.foo(f'{task_id}...') should probably first sanitize those strings of statsd-sensitive characters.

So instead of

Current: Statsd.timer("dag.{self.task.dag_id}.{self.task.task_id}.duration") -> "dag.DAG.WITH.PERIODS.IN_NAME.TASK.WITH.PERIODS.IN.NAME.duration"

Recommended: Stats.timer(f"dag.{statsd_sanitize(self.task.dag_id)}.{statsd_sanitize(self.task.task_id)}.duration") -> "dag.DAG_WITH_PERIODS_IN_NAME.TASK_WITH_PERIODS_IN_NAME.duration"

How to reproduce

  1. create a dag
  2. write a task in that dag with task_id which contains a period. Like: "foo.bar"
  3. enable statsd metrics collection
  4. run your dag
  5. struggle to deal with poorly-formatted metric names like "dag.DAG.WITH.PERIODS.IN.NAME.TASK.WITH.PERIODS.IN.NAME.duration"

Operating System

debian

Versions of Apache Airflow Providers

apache-airflow-providers-apache-hive==2.3.1
apache-airflow-providers-apache-spark==2.1.2
apache-airflow-providers-celery==2.1.3
apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-docker==2.5.2
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-google==6.7.0
apache-airflow-providers-grpc==2.0.4
apache-airflow-providers-hashicorp==2.1.4
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-microsoft-azure==3.7.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-odbc==2.0.4
apache-airflow-providers-postgres==4.1.0
apache-airflow-providers-redis==2.0.4
apache-airflow-providers-sendgrid==2.0.4
apache-airflow-providers-sftp==2.5.2
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-sqlite==2.1.3
apache-airflow-providers-ssh==2.4.3

Deployment

Official Apache Airflow Helm Chart

Deployment details

Deployed with v8.6.1 of https://github.com/airflow-helm/charts/tree/main/charts/airflow

We wired up a pod to convert statsd->prom using https://github.com/prometheus/statsd_exporter, and use Matcher rules by glob/regex to convert statsd-style metrics into prom-style.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template!

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.

github-actions[bot] commented 8 months ago

This issue has been closed because it has not received response from the issue author.