apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.14k stars 14.32k forks source link

Metrics missing while using statsd #37420

Closed htpawel closed 8 months ago

htpawel commented 9 months ago

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.7.1

What happened?

I have airflow with statsd enabled (airflow > statsd-exporter > prometheus) and I believe that airflow is not sending all metrics defined in apache-airflow/2.7.1/metrics. I've added _dag...scheduledduration and _dag...queuedduration to my statsd mappings file:

  - match: "*.dag.*.*.scheduled_duration"
    match_metric_type: observer
    name: "af_agg_dag_task_scheduled_duration"
    labels:
      airflow_id: "$1"
      dag_id: "$2"
      task_id: "$3"
  - match: "*.dag.*.*.queued_duration"
    match_metric_type: observer
    name: "af_agg_dag_task_queued_duration"
    labels:
      airflow_id: "$1"
      dag_id: "$2"
      task_id: "$3"

And could not find it in prometheus, so I checked under /metrics on statsd-exporter and did not find either. Later found out that more are missing, eg _dag...duration_ (I don't know if it's a coincidence but metrics with one or two labels, including airflow_id, works fine, but with more does not). Even when removing mapping entirely those metrics with default names are missing. There are not any logs related to those metrics either (with log level = debug) in statsd-exporter.

What you think should happen instead?

Metrics should be available in statsd-exporter like all the others.

How to reproduce

Enable statsd metrics on Airflow 2.7.1, then connect Airflow with statsd-exporter (0.26.0) and check /metrics

Operating System

Ubuntu 22.04.3 LTS

Versions of Apache Airflow Providers

Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

Deployment

Virtualenv installation

Deployment details

Default installation from pypi - https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 9 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

pvaling commented 9 months ago

I have similar behaviour on some 2.4.3 instances with 20+ days uptime. After scheduler restart it seems ok (not sure how long it will last yet). I'am focused on .dagrun.duration.failed and its missing sometimes.

Scheduler code seems to be working fine.

First i checked that sheduler processes are durable for statsd network outages - is ok. Metrics are not dissapeared after 5 min network failure.

jscheffl commented 8 months ago

@AutomationDev85 is this the same problem you also reported to me personally?

htpawel commented 8 months ago

nvm, it was our fault (facepalm) we had statsd-exporter as container in the same pod as scheduler and sending metrics on localhost so we had metrics from scheduler only. It must be Service accessible from executors also..