databand-ai / airflow-dashboards

Grafana dashboards and StatsD exporter config for Airflow monitoring
Apache License 2.0
266 stars 104 forks source link

Mappings rules (statsd.conf) may be broken? #1

Closed josevnz closed 3 years ago

josevnz commented 3 years ago

Hello,

I'm trying to use your mappings file for the Airflow exporter, but I'm not sure if is mapping the stats from Airflow. I'm using the docker container for this and I do see the following being published to the statds_exporter:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000178466
go_gc_duration_seconds{quantile="0.25"} 0.000522757
go_gc_duration_seconds{quantile="0.5"} 0.000609936
go_gc_duration_seconds{quantile="0.75"} 0.000969539
go_gc_duration_seconds{quantile="1"} 0.001874583
go_gc_duration_seconds_sum 0.034418747
go_gc_duration_seconds_count 51
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.15.8"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.08776e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 8.1952032e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.452221e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 193168
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.826423451571187e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.773616e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.08776e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.221824e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 4.333568e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2713
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.2005248e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6551808e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6194519407053459e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 195881
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6944
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 75480
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 81920
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 6.042192e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.098835e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 557056
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 557056
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.453184e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 7
# HELP pg_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from PostgresSQL.
# TYPE pg_exporter_last_scrape_duration_seconds gauge
pg_exporter_last_scrape_duration_seconds 1.001691309
# HELP pg_exporter_last_scrape_error Whether the last scrape of metrics from PostgreSQL resulted in an error (1 for error, 0 for success).
# TYPE pg_exporter_last_scrape_error gauge
pg_exporter_last_scrape_error 0
# HELP pg_exporter_scrapes_total Total number of times PostgresSQL was scraped for metrics.
# TYPE pg_exporter_scrapes_total counter
pg_exporter_scrapes_total 210
# HELP pg_up Whether the last scrape of metrics from PostgreSQL was able to connect to the server (1 for yes, 0 for no).
# TYPE pg_up gauge
pg_up 0
# HELP postgres_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which postgres_exporter was built.
# TYPE postgres_exporter_build_info gauge
postgres_exporter_build_info{branch="",goversion="go1.15.8",revision="",version="0.0.1"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 4.29
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.8161664e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.61944551384e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.31172864e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 209
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

But no trace of any of the stats mentioned on the Airflow website

I ran the exporter with a config check and I do see warnings but don't think these are fatal:

# Get the mappings file
josevnz@raspberrypi:~$ /usr/bin/mkdir -p -v $HOME/etc/
josevnz@raspberrypi:~$ curl --location --output $HOME/etc/statsd_exporter_mapping.yaml https://raw.githubusercontent.com/databand-ai/airflow-dashboards/main/statsd/statsd.conf

# Check the configuration

josevnz@raspberrypi:~$ /usr/bin/docker run --rm --interactive --tty --publish 9102:9102 --publish 9125:9125 --publish 9125:9125/udp --volume $HOME/etc/statsd_exporter_mapping.yaml:/etc/statsd_exporter_mapping.yaml:ro  prom/statsd-exporter --statsd.mapping-config=/etc/statsd_exporter_mapping.yaml --web.enable-lifecycle --check-config
level=info ts=2021-04-26T15:49:21.932Z caller=main.go:321 msg="Starting StatsD -> Prometheus Exporter" version="(version=0.20.1, branch=HEAD, revision=2b5239a67f716418a9dbdf70ca7bf2513fc9f7cc)"
level=info ts=2021-04-26T15:49:21.932Z caller=main.go:322 msg="Build context" context="(go=go1.16.2, user=root@f0e567f47a2a, date=20210326-17:33:12)"
WARN[0000] backtracking required because of match "*.ti_failures", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.ti_successes", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.zombies_killed", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler_heartbeat", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.sla_email_notification_failure", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagbag_size", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.processes", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.critical_section_busy", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag.callback_exceptions", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.celery.task_timeout_error", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.import_errors", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.total_parse_time", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.processor_timeouts", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.executor.open_slots", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.executor.queued_tasks", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.executor.running_tasks", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.smart_sensor_operator.poked_tasks", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.smart_sensor_operator.poked_success", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.smart_sensor_operator.poked_exception", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.smart_sensor_operator.exception_failures", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.smart_sensor_operator.infra_failures", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.critical_section_duration", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.tasks.killed_externally", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.tasks.running", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.tasks.starving", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.orphaned_tasks.cleared", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.scheduler.orphaned_tasks.adopted", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.last_runtime.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.pool.open_slots.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.pool.queued_slots.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.pool.running_slots.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.pool.starving_tasks.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagrun.dependency-check.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.last_duration.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagrun.schedule_delay.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagrun.*.first_task_scheduling_delay", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.ti.start.*.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag_processing.last_run.seconds_ago.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dag.*.*.duration", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagrun.duration.success.*", matching performance may be degraded  source="fsm.go:313"
WARN[0000] backtracking required because of match "*.dagrun.duration.failed.*", matching performance may be degraded  source="fsm.go:313"
level=info ts=2021-04-26T15:49:21.943Z caller=main.go:357 msg="Configuration check successful, exiting"

I'm running the container like this:

/usr/bin/docker run --name statsd-exporter --detach --publish 9102:9102 --publish 9125:9125 --publish 9125:9125/udp --volume $HOME/etc/statsd_exporter_mapping.yaml:/etc/statsd_exporter_mapping.yaml:ro  prom/statsd-exporter  --no-statsd.parse-dogstatsd-tags --no-statsd.parse-librato-tags --no-statsd.parse-signalfx-tags --statsd.mapping-config=/etc/statsd_exporter_mapping.yaml --web.enable-lifecycle

Please let me know if I'm not doing something correctly or if you need more details. I'm running Airflow 2.0.1

josevnz commented 3 years ago

Statsd mappings are correct, it was my mistake all the time. In the airflow.cfg I forgot to change the publish port:

 StatsD (https://github.com/etsy/statsd) integration settings.
# Enables sending metrics to StatsD.
statsd_on = True
statsd_host = raspberrypi
statsd_port = 9125
statsd_prefix = airflow

After you restart Airflow and if you run the container with the --loglevel=debug flag enabled you will see all the stats coming in real time:

/usr/bin/docker run --rm --interactive --tty --publish 9102:9102 --publish 9125:9125 --publish 9125:9125/udp --volume $HOME/etc/statsd_exporter_mapping.yaml:/etc/statsd_exporter_mapping.yaml:ro  prom/statsd-exporter --statsd.mapping-config=/etc/statsd_exporter_mapping.yaml --web.enable-lifecycle --log.level=debug
docker logs --follow statsd-exporter
...

level=debug ts=2021-05-02T14:44:22.882Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
level=debug ts=2021-05-02T14:44:22.882Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.import_errors:0|g
level=debug ts=2021-05-02T14:44:23.036Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.036Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.040Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.processes:-1|c
level=debug ts=2021-05-02T14:44:23.041Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.total_parse_time:0.1581331390189007|g
level=debug ts=2021-05-02T14:44:23.041Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
level=debug ts=2021-05-02T14:44:23.042Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.import_errors:0|g
level=debug ts=2021-05-02T14:44:23.152Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.scheduler.critical_section_duration:14.026885|ms
level=debug ts=2021-05-02T14:44:23.160Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.open_slots:16|g
level=debug ts=2021-05-02T14:44:23.160Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.queued_tasks:0|g
level=debug ts=2021-05-02T14:44:23.161Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.executor.running_tasks:0|g
level=debug ts=2021-05-02T14:44:23.236Z caller=exporter.go:121 msg="counter must be non-negative value" metric=af_agg_dag_processing_processes event_value=-1
level=debug ts=2021-05-02T14:44:23.900Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dag_processing.total_parse_time:0.8583512239856645|g
level=debug ts=2021-05-02T14:44:23.900Z caller=listener.go:73 msg="Incoming line" proto=udp line=airflow.dagbag_size:32|g
kbohra commented 3 years ago

@josevnz Just curios but did you get all the metrics or just few ? I can see around 41 warnings having with message "backtracking required because of match" and are getting silently suppressed. Seems related to https://github.com/prometheus/statsd_exporter#ordering-glob-rules . I'm using Airflow-1.10.14 & statsd_exporter-0.20.2

@rozhok If possible can you share which version of airflow & statsd_exporter you have used. thanks

time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti_successes\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.zombies_killed\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler_heartbeat\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.sla_email_notification_failure\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagbag_size\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.processes\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.critical_section_busy\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag.callback_exceptions\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.celery.task_timeout_error\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.import_errors\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.total_parse_time\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.processor_timeouts\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.open_slots\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.queued_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.executor.running_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_tasks\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_success\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.poked_exception\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.exception_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.smart_sensor_operator.infra_failures\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.critical_section_duration\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.killed_externally\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.running\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.starving\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.cleared\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.adopted\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_runtime.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.open_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.queued_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.running_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.pool.starving_tasks.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.dependency-check.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_duration.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.schedule_delay.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.*.first_task_scheduling_delay\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.ti.start.*.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_run.seconds_ago.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dag.*.*.duration\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.duration.success.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-06T16:17:00Z" level=warning msg="backtracking required because of match \"*.dagrun.duration.failed.*\", matching performance may be degraded" source="fsm.go:313"
rozhok commented 3 years ago

@kbohra I've used statsd-exporter 0.18.0 and airflow 1.10.10

kbohra commented 3 years ago

@rozhok thank you for the details, I have downgraded statsd-exporter to 0.18.0 but still get warnings on 41 metrics. Other charts are working fine though. I will debug more and try changing the dashboard json to avoid backtracking.

$ grep backtracking nohup.out |wc -l
41
$ grep backtracking nohup.out |head
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.killed_externally\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.running\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.tasks.starving\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.cleared\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.scheduler.orphaned_tasks.adopted\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.dag_processing.last_runtime.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.open_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.queued_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.running_slots.*\", matching performance may be degraded" source="fsm.go:313"
time="2021-06-07T17:32:26Z" level=warning msg="backtracking required because of match \"*.pool.starving_tasks.*\", matching performance may be degraded" source="fsm.go:313"
rozhok commented 3 years ago

I'm not sure it's possible to create universal mapping suitable for all cases. Airflow sends metrics using configured in statsd_prefix prefix option like: airflow_dag_sample_dag_dummy_task_duration. You can rewrite your mappings considering this prefix.

shreyash184 commented 7 months ago

Even I am getting the same warning and I am facing the same issue not getting any metrics that is mentioned in the statsd.yaml mappings

shreyash184 commented 7 months ago

StatsD logs -

[+] Running 1/0 ✔ Container monitoring-dashboard-statsd-exporter-1 Created 0.0s Attaching to statsd-exporter-1 statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.511Z caller=main.go:321 msg="Starting StatsD -> Prometheus Exporter" version="(version=0.21.0, branch=HEAD, revision=ef6627b9f05350d54cd3bfea5afe36617d7eb5a4)" statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.513Z caller=main.go:322 msg="Build context" context="(go=go1.16.5, user=root@8ace135a0329, date=20210610-07:24:59)" statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti.start.. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_run.seconds_ago. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag...duration statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.duration.success. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.duration.failed. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.ti_successes statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.zombies_killed statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler_heartbeat statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.sla_email_notification_failure statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagbag_size statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.processes statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.critical_section_busy statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag.callback_exceptions statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.celery.task_timeout_error statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.import_errors statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.total_parse_time statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.processor_timeouts statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.open_slots statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.queued_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.executor.running_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_tasks statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_success statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.poked_exception statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.exception_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.smart_sensor_operator.infra_failures statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.critical_section_duration statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.killed_externally statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.running statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.tasks.starving statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.orphaned_tasks.cleared statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.scheduler.orphaned_tasks.adopted statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_runtime.* statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.open_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.queued_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.running_slots. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.pool.starving_tasks. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.dependency-check. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dag_processing.last_duration. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun.schedule_delay. statsd-exporter-1 | level=warn ts=2024-04-08T11:56:09.515Z caller=fsm.go:313 msg="backtracking required because of match. Performance may be degraded" match=.dagrun..first_task_scheduling_delay statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.515Z caller=main.go:361 msg="Accepting StatsD Traffic" udp=:9125 tcp=:9125 unixgram= statsd-exporter-1 | level=info ts=2024-04-08T11:56:09.515Z caller=main.go:362 msg="Accepting Prometheus Requests" addr=:9102

you can see the code from here - https://github.com/shreyash184/Monitoring-Dashboard

can someone please help if I am missing anything ?