getsentry / airflow-metrics

Metrics for airflow
Apache License 2.0
14 stars 6 forks source link

No airflow.dag metrics are being logged #42

Open Taxuspt opened 5 years ago

Taxuspt commented 5 years ago

I have set up airflow-metrics according to the documentation but I'm not receiving all the metrics on datadog.

For reference, this is the list of metrics that datadog is receiving: image

I have the following block on my airflow.cfg

[airflow_metrics]

airflow_metrics_enabled = True
airflow_metrics_tasks_enabled = True
airflow_metrics_bq_enabled = True
airflow_metrics_gcs_to_bq_enabled = True
airflow_metrics_requests_enabled = True
airflow_metrics_thread_enabled = True

I tested it with both Local and Celery executors.

Here is a portion of the scheduler log when I manually schedule a dag

[2019-08-09 11:27:28,020] {cli.py:517} INFO - Running <TaskInstance: vacuum_db.bash_task 2019-08-09T09:27:23.561398+00:00 [queued]> on host xxxx
[2019-08-09 11:27:32,728] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:27:32,893] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (167.9699ms)
[2019-08-09 11:27:33,151] {datadog_logger.py:41} INFO - datadog gauge: task.state 1 1 False {'state': 'running'}
[2019-08-09 11:27:33,151] {datadog_logger.py:41} INFO - datadog gauge: task.state 2438 1 False {'state': 'success'}
[2019-08-09 11:27:33,874] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:27:38,309] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): xxx.s3.amazonaws.com
[2019-08-09 11:27:38,512] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (897.4719ms)
[2019-08-09 11:27:39,168] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): xxx.s3.amazonaws.com
[2019-08-09 11:27:39,919] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:27:42,898] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:27:43,057] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (162.9119ms)
[2019-08-09 11:27:43,162] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:27:45,932] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:27:48,519] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:27:48,783] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (267.9617ms)
[2019-08-09 11:27:51,988] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:27:53,067] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:27:53,173] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:27:53,221] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (158.3161ms)
[2019-08-09 11:27:53,950] {jobs.py:1468} INFO - Executor reports execution of vacuum_db.bash_task execution_date=2019-08-09 09:27:23.561398+00:00 exited with status success for try_number 1
[2019-08-09 11:27:57,979] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:03,184] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:28:03,307] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:28:03,438] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (215.4486ms)
[2019-08-09 11:28:03,979] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:09,988] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:13,195] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:28:13,445] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:28:13,577] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (134.8519ms)
[2019-08-09 11:28:16,012] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:22,032] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:23,208] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:28:23,582] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:28:23,847] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (268.4329ms)
[2019-08-09 11:28:28,056] {datadog_logger.py:31} INFO - datadog incr: scheduler_heartbeat 1 1 None
[2019-08-09 11:28:33,221] {datadog_logger.py:41} INFO - datadog gauge: task.state 2439 1 False {'state': 'success'}
[2019-08-09 11:28:33,852] {patch_requests.py:33} WARNING - Found blacklisted domain: api.datadoghq.com
[2019-08-09 11:28:33,999] {api_client.py:139} INFO - 202 POST https://api.datadoghq.com/api/v1/series (151.0379ms)

Any tips? Thanks

Taxuspt commented 5 years ago

After some more debugging, it seems that the EventManager is never getting an after_update event from DagRun.

hatched-DavidMichon commented 5 years ago

Were you able to fix it? I am facing the same issue.

Taxuspt commented 4 years ago

Were you able to fix it? I am facing the same issue.

I spent quite some time looking for a fix without success. I think the problem comes from Flask and not from Airflow or airflow-metrics.

hatched-DavidMichon commented 4 years ago

@Taxuspt Actually I moved back to the integrated airflow metrics (version 1.10.5) which covers most if not all of the metrics of this project. https://airflow.apache.org/metrics.html

DannyNemer commented 4 years ago

I have the same issue.

milanrm commented 4 years ago

I am also facing the same issue. did you able to fix it ?

jmorgan415 commented 4 years ago

@hatched-DavidMichon Hey, I've also ran into the same issue in this thread, so tried the integrated airflow metrics ~> Datadog instead, but am noticing that many of the metrics are missing there as well (primarily dag metrics). I have the datadog agent deployed in my cluster via helm and have configured my airflow.cfg per the docs. I noticed that if I have my "statsd_on = True", my webserver and scheduler start crashing. If I set to "False" and point my statsd host to the Datadog agent, I'll get a lot of metrics showing up in Datadog, but they'll disappear after a few minutes. I'm running airflow on k8s using k8s executer and k8s pod operator. I'm curious what your config looked like and if you're using Datadog? Thanks!

remigabillet commented 4 years ago

This patch is awesome and I now have Airflow running on Astronomer Cloud with datadog instrumentation. It works great!

Unfortunately I have the same issue here: the airflow.dag.duration metric isn't collected. I'll let you know if I find a workaround.