Closed josevnz closed 3 years ago
Statsd mappings are correct, it was my mistake all the time. In the airflow.cfg I forgot to change the publish port:
StatsD (https://github.com/etsy/statsd) integration settings.
# Enables sending metrics to StatsD.
statsd_on = True
statsd_host = raspberrypi
statsd_port = 9125
statsd_prefix = airflow
After you restart Airflow and if you run the container with the --loglevel=debug flag enabled you will see all the stats coming in real time:
/usr/bin/docker run --rm --interactive --tty --publish 9102:9102 --publish 9125:9125 --publish 9125:9125/udp --volume $HOME/etc/statsd_exporter_mapping.yaml:/etc/statsd_exporter_mapping.yaml:ro prom/statsd-exporter --statsd.mapping-config=/etc/statsd_exporter_mapping.yaml --web.enable-lifecycle --log.level=debug
docker logs --follow statsd-exporter
...
level=debug ts=2021-05-02T14:44:22.882Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
level=debug ts=2021-05-02T14:44:22.882Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.import_errors:0|g
level=debug ts=2021-05-02T14:44:23.036Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.036Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.040Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.processes:-1|c
level=debug ts=2021-05-02T14:44:23.041Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.total_parse_time:0.1581331390189007|g
level=debug ts=2021-05-02T14:44:23.041Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
level=debug ts=2021-05-02T14:44:23.042Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.import_errors:0|g
level=debug ts=2021-05-02T14:44:23.152Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.scheduler.critical_section_duration:14.026885|ms
level=debug ts=2021-05-02T14:44:23.160Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.open_slots:16|g
level=debug ts=2021-05-02T14:44:23.160Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.queued_tasks:0|g
level=debug ts=2021-05-02T14:44:23.161Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.running_tasks:0|g
level=debug ts=2021-05-02T14:44:23.236Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.900Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.total_parse_time:0.8583512239856645|g
level=debug ts=2021-05-02T14:44:23.900Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
@josevnz Just curios but did you get all the metrics or just few ? I can see around 41 warnings having with message "backtracking required because of match" and are getting silently suppressed. Seems related to https://github.com/prometheus/statsd_exporter#ordering-glob-rules . I'm using Airflow-1.10.14 & statsd_exporter-0.20.2
@rozhok If possible can you share which version of airflow & statsd_exporter you have used. thanks
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti_successes\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.zombies_killed\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler_heartbeat\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.sla_email_notification_failure\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagbag_size\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.processes\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.critical_section_busy\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag.callback_exceptions\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.celery.task_timeout_error\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.import_errors\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.total_parse_time\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.processor_timeouts\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.open_slots\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.queued_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.running_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_success\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_exception\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.exception_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.infra_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.critical_section_duration\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.killed_externally\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.running\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.starving\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.cleared\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.adopted\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_runtime.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.open_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.queued_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.running_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.starving_tasks.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.dependency-check.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_duration.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.schedule_delay.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.*.first_task_scheduling_delay\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti.start.*.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_run.seconds_ago.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag.*.*.duration\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.duration.success.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.duration.failed.*\", matching performance may be degraded" source="fsm.go:313"
@kbohra I've used statsd-exporter 0.18.0 and airflow 1.10.10
@rozhok thank you for the details, I have downgraded statsd-exporter to 0.18.0 but still get warnings on 41 metrics. Other charts are working fine though. I will debug more and try changing the dashboard json to avoid backtracking.
$ grep backtracking nohup.out |wc -l
41
$ grep backtracking nohup.out |head
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.killed_externally\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.running\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.starving\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.cleared\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.adopted\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_runtime.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.open_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.queued_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.running_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.starving_tasks.*\", matching performance may be degraded" source="fsm.go:313"
I'm not sure it's possible to create universal mapping suitable for all cases. Airflow sends metrics using configured in statsd_prefix
prefix option like: airflow_dag_sample_dag_dummy_task_duration
. You can rewrite your mappings considering this prefix.
Even I am getting the same warning and I am facing the same issue not getting any metrics that is mentioned in the statsd.yaml mappings
StatsD logs -
[+] Running 1/0 ✔ Container monitoring-dashboard-statsd-exporter-1 Created 0.0s Attaching to statsd-exporter-1 statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.511Z caller=main.go:321 msg="Starting StatsD -> Prometheus Exporter" version="(version=0.21.0, branch=HEAD, revision=ef6627b9f05350d54cd3bfea5afe36617d7eb5a4)" statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.513Z caller=main.go:322 msg="Build context" context="(go=go1.16.5, user=root@8ace135a0329, date=20210610-07:24:59)" statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti.start.. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_run.seconds_ago. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag...duration statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.duration.success. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.duration.failed. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti_successes statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.zombies_killed statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler_heartbeat statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.sla_email_notification_failure statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagbag_size statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.processes statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.critical_section_busy statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag.callback_exceptions statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.celery.task_timeout_error statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.import_errors statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.total_parse_time statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.processor_timeouts statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.open_slots statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.queued_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.running_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_success statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_exception statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.exception_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.infra_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.critical_section_duration statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.killed_externally statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.running statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.starving statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.orphaned_tasks.cleared statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.orphaned_tasks.adopted statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_runtime.* statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.open_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.queued_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.running_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.starving_tasks. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.dependency-check. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_duration. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.schedule_delay. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun..first_task_scheduling_delay statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.515Z caller=main.go:361 msg="Accepting StatsD Traffic" udp=:9125 tcp=:9125 unixgram= statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.515Z caller=main.go:362 msg="Accepting Prometheus Requests" addr=:9102
you can see the code from here - https://github.com/shreyash184/Monitoring-Dashboard
can someone please help if I am missing anything ?
Hello,
I'm trying to use your mappings file for the Airflow exporter, but I'm not sure if is mapping the stats from Airflow. I'm using the docker container for this and I do see the following being published to the statds_exporter:
But no trace of any of the stats mentioned on the Airflow website
I ran the exporter with a config check and I do see warnings but don't think these are fatal:
I'm running the container like this:
Please let me know if I'm not doing something correctly or if you need more details. I'm running Airflow 2.0.1