DataDog / heroku-buildpack-datadog

Heroku Buildpack to run the Datadog Agent in a Dyno
https://www.datadoghq.com/
Apache License 2.0
69 stars 94 forks source link

APM/traces for job code run by worker dynos #225

Closed carltondickson closed 3 years ago

carltondickson commented 3 years ago

Hi,

We're currently using DataDog for our web dyno and it shows logs and lnks through to traces no problem. We want to move some logic to jobs that will get processed by our worker dynos (non web) but still want to get the insight we get from our traces...so was wondering can we get trace data when code is executed by our worker dynos?

We currently log messages in our jobs and can see these in datadog fine, as well as the fact that it was queue.2 (we have a facet for syslog.procid) but I can't see anything in APM that shows me the job being executed.

If I access https://datadoghq.eu/infrastructure?group=dynotype I can see that our web dynos have ntp, system and trace tags but the queue dynos are missing the trace tag...is this the issue?

arapulido commented 3 years ago

Hello! There shouldn't be a difference between those? Are you running a prerun.sh script that may be disabling the trace-agent for the queue dynos?

carltondickson commented 3 years ago

No we don't have that file (datadog/prerun.sh) in the project and I can't see us using DISABLE_DATADOG_AGENT anywhere in the project or in our env vars

These are the env vars we use for DataDog

DD_AGENT_MAJOR_VERSION:              7
DD_API_KEY:                          our-api-key
DD_APM_ENABLED:                      true
DD_DISABLE_HOST_METRICS:             false
DD_ENV:                              development
DD_LARAVEL_ANALYTICS_ENABLED:        true
DD_LOG_LEVEL:                        OFF
DD_SERVICE:                          our-app
DD_SERVICE_NAME:                     our-app
DD_SITE:                             datadoghq.eu
DD_TRACE_ANALYTICS_ENABLED:          true
DD_TRACE_ENABLED:                    true
DD_TRACE_LARAVEL_ANALYTICS_ENABLED:  true

We're also using these buildpacks

heroku-buildpack-github-netrc
heroku/php
https://github.com/DataDog/heroku-buildpack-datadog.git#1.21
https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
https://github.com/carltondickson/heroku-buildback-dd-tracer-php#1.0.1

I'm not really sure what else could be influencing how the agent runs on the queue dynos.

Are there any example Heroku/DataDog projects that I could reference or should I revisit the documentation from scratch to work through this one?

arapulido commented 3 years ago

Is the code for the queue dynos instrumented for traces?

Also, I would recommend you to exec into one of the queue dynos (heroku ps:exec -a <your-app> -d <one-of-your-queue-dynos> bash) and then once you are in:

export DD_API_KEY=<your-api-key>
agent-wrapper status

And check the section about APM to see if it is running and finding / sending traces.

carltondickson commented 3 years ago

Hi @arapulido

Is the code for the queue dynos instrumented for traces? Not sure what you mean. I've followed the guide here - https://docs.datadoghq.com/logs/log_collection/php/?tab=phpmonolog#laravel

I'll revisit the APM guide but I "believe" the AppServiceProvider code is executed when console commands are run.

We also tried adding the following as this is a step we missed for "Laravel Artisan" support but no joy (https://docs.datadoghq.com/tracing/setup_overview/compatibility_requirements/php/#cli-library-compatibility)

DD_TRACE_CLI_ENABLED=true

The APM agent is running on the queue dyno but nothing received or written (below). I've compared this to the web dyno and can see what I imagine the stats should look like when traces are being sent correctly

=========
APM Agent
=========
  Status: Running
  Pid: 85
  Uptime: 1329 seconds
  Mem alloc: 15,695,736 bytes
  Hostname: xxxxxxxxxxxxxx
  Receiver: localhost:8126
  Endpoints:
    https://trace.agent.datadoghq.eu

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes
arapulido commented 3 years ago

That output means that the APM agent is correctly running, but it is not finding any traces.

That guide you pointed at is to send logs, but not traces (the code related to traces are about correlation). This is the guide for traces: https://docs.datadoghq.com/tracing/setup_overview/setup/php

carltondickson commented 3 years ago

I think DD_TRACE_CLI_ENABLED was actually something that worked. We're using horizon to process our queues and can see the traces but they are really long running but that's a separate issue. (I've seen a few threads in DDTrace for PHP that suggest we might have to close spans manually) Closing for now 👍🏽