DataDog / dd-trace-php

Datadog PHP Clients
https://docs.datadoghq.com/tracing/setup/php
Other
496 stars 155 forks source link

Horizon instrumentation? #386

Closed maxcountryman closed 1 year ago

maxcountryman commented 5 years ago

I'm wondering if Laravel Horizon is meant to be automatically instrumented by this extension?

labbati commented 5 years ago

Hi! At the moment our tracer doesn't instrument Laravel Horizon. Instrumenting it can be seen as a two steps process.

1) Instrumenting CLI apps. Currently if we detect the cli SAPI we simply do not instrument. We have plans to make this behavior configurable, so DD_TRACER_CLI_ENABLED php artisan horizon would at least instrument a cli application.

Once we do 1), I bet that you still will be missing relevant info for Laravel Horizon that are very speicific to it. Then comes 2)...

2) Instrument Laravel Horizon processors to add relevant info to the spans and correctly mark success/failures + add relevant metadata.

While 1) is something that we have on our radar and I bet won't be long before we can start working on it, 2) is currently not on our radar and we may need some help from the community.

Would you (or anyone from the community that reads) be interested in 2) if we implement 1)?

damiencriado commented 5 years ago

Would be very interested in 2)

crabmusket commented 3 years ago

Just checking if there's been any advances in this direction. As for 1), I see in the docs that there is a DD_TRACE_CLI_ENABLED parameter. Does this have any effect? I'd start investigating 2 if so.

damiencriado commented 3 years ago

Just checking if there's been any advances in this direction. As for 1), I see in the docs that there is a DD_TRACE_CLI_ENABLED parameter. Does this have any effect? I'd start investigating 2 if so.

No effect on horizon. But it works with laravel scheduler.

adammparker commented 3 years ago

A couple of things we've tried recently with Horizon:

In AppServiceProvider@boot:

/** @psalm-suppress UndefinedFunction */
\DDTrace\trace_method(
    \Illuminate\Queue\Jobs\Job::class,
    'fire',
    /** @psalm-suppress UndefinedClass */
    function (\DDTrace\SpanData $spanData, $args, $retval) {
        $spanData->type = Type::CLI;
        $spanData->service = 'api-worker';
        /** @var \Illuminate\Queue\Jobs\Job::class $this */
        $spanData->resource = $this->resolveName();
        $spanData->meta = [
            'laravel.job_id' => $this->getJobId(),
        ];
    }
);

Then we added two event/listeners to EventServiceProvider

\Illuminate\Queue\Events\JobProcessed::class => [
    DatadogJobProcessed::class,
],

\Illuminate\Queue\Events\JobFailed::class => [
    DatadogJobFailed::class,
],

In both of the above listeners (DatadogJobProcessed and DatadogJobFailed) we are just calling:

\DDTrace\GlobalTracer::get()->flush();

Testing locally (without sending anything to the DD Agent) things seem mostly correct and seem like they should work, but on the Datadog dashboard once we push it up to a higher environment we aren't seeing anything (looking inside artisan horizon:work, etc.).

ENVs:

DD_TRACE_CLI_ENABLED=true
DD_TRACE_AUTO_FLUSH_ENABLED=true
DD_TRACE_GENERATE_ROOT_SPAN=false
DD_TRACE_LARAVEL_ENABLED=true

One other note, we also tried (without the AppServiceProvider bit and just manually creating a Span). There is a Illuminate\Queue\Events\JobProcessing class as well that you can fire a listener from. That seemed to almost work as well but still not seeing much of anything in Datadog dashboard. The 3rd listener would look like this:

\Illuminate\Queue\Events\JobProcessing::class => [
    DatadogJobProcessing::class,
],

And the DatadogJobProcessing:

public function handle(JobProcessing $event)
{
    $scope = \DDTrace\GlobalTracer::get()->startActiveSpan('handle');
    $span = $scope->getSpan();
    $span->setTag('laravel_job_id', $event->job->getJobId());
    $span->setTag(Tag::SERVICE_NAME, 'api-worker');
    $span->setTag(Tag::RESOURCE_NAME, $event->job->resolveName());
}

Any tips on what might be wrong about the above or other things we could try?

jzahedieh commented 2 years ago

@adammparker your AppServiceProvider / EventServiceProvider is pushing data up to our "higher" environments, the jobs are separated in APM > Traces

Thanks for your solution, if I can be on any assistance please let me know

vladan-me commented 2 years ago

I'm also interested in this feature. I tried everything from the thread to make Horizon jobs traced but ultimately, only CLI tracking works.

Here's an example of a CLI trace. Let's say you want to run an artisan command from the console, you'd do:

export DD_TRACE_CLI_ENABLED=1

followed by

php artisan ...

Then, on the APM trace page you'd search for:

resource_name:artisan*

it should show up after a while

fosron commented 2 years ago

Any update on possible instrumentation for Horizon? With Profiler & Code Hotspots working for PHP it would be a really big saver for our product.

bwoebi commented 2 years ago

Hey @fosron, a concrete instrumentation for Laravel horizon is planned at some point; the last release (0.78.0) just implemented tracing for pcntl_fork()'ed processes.

So by now at least, you should be able to manually instrument it.

vladan-me commented 1 year ago

@bwoebi Can you please provide an idea/code prototype on how we'd manually instrument it?

bwoebi commented 1 year ago

@vladan-me To be honest, that's half of the hard work we have to do to instrument it ourselves too - make ourselves familiar with the codebase, look into which function calls need to be instrumented etc.

By the point we do that investigation, we'll just provide the full blown instrumentation.

nklmilojevic commented 1 year ago

Other tools, like Sentry APM have automatic instrumentations on queues. We have also requested this feature for quite a while, but it seemingly isn't being implemented, and we will have to move from Datadog to some other tool.

vladan-me commented 1 year ago

@nklmilojevic Thanks for the tip, I did a quick test today and Sentry indeed works well. We'll likely move from Datadog APM as well due to the lack of attention toward this issue.

s1rc commented 1 year ago

So we did get this kind of working by creating our own queue trace service provider in Laravel. If you combine this with setting the follow env variables individual jobs do show up on our configured horizon service workers, under the Illuminate_Queue_Worker.process operation.

DD_TRACE_CLI_ENABLED=1
DD_TRACE_AUTO_FLUSH_ENABLED=1
DD_TRACE_GENERATE_ROOT_SPAN=0

There are some caveats with this implementation that you will need to have separate deployment environments for your horizon workers from any web/api. We run ours within a k8s cluster and it works ok for now, definitely adds a lot of needed observability into horizon jobs.

I would still advocate for a properly supported implementation from DataDog as with Sentry.

<?php

namespace App\Providers\APM;

use DDTrace\SpanData;
use function DDTrace\trace_method;
use Illuminate\Contracts\Queue\Job;
use Illuminate\Queue\Jobs\JobName;
use Illuminate\Queue\Worker;
use Illuminate\Support\ServiceProvider;

class QueueTraceServiceProvider extends ServiceProvider
{
    public function register(): void
    {
        if (! extension_loaded('ddtrace')) {
            return;
        }

        trace_method(
            Worker::class,
            'process',
            function (SpanData $span, array $args) {
                /**
                 * @var string $connectionName
                 * @var Job $job
                 */
                [$connectionName, $job] = $args;
                $payload = $job->payload();

                $span->meta = [
                    'attempts' => $job->attempts(),
                    'connection' => $connectionName,
                    'has_failed' => $job->hasFailed(),
                    'is_deleted' => $job->isDeleted(),
                    'job_uuid' => $job->uuid(),
                    'max_tries' => $job->maxTries(),
                    'queue' => $job->getQueue(),
                    'retry_until' => $job->retryUntil(),
                    'timeout' => $job->timeout(),
                ];

                $span->resource = JobName::resolve($job->getName(), $payload);
                $span->type = 'queue';
            }
        );
    }
}
pierotibou commented 1 year ago

Hello @nklmilojevic, @vladan-me, I'm sorry to hear that but I obviously understand your frustration. The team has went to great extent to migrate a good chunk of our extension from PHP to C, which took a good amount of our capacity unfortunately. Though, thanks to @bwoebi and @krakjoe, we're in a better place now and finally start to reinvest in our integrations.

This quarter we plan to to deliver a bunch of integrations including Laravel queues. @PROFeNoM is working on Laminas and RabbitMQ now, he will look into Laravel queues next. We'll see from there if we shall plan more work for Horizon more specifically.

@s1rc , thanks a lot for offering the custom instrumentation as a workaround.

fosron commented 1 year ago

@s1rc Thank you very much for the code! This works wonders!

I have a quick question for everyone else though - is it possible to make Profiling / Code Hotspots work with this? Is there anything specific we can add to that code to "connect" these pieces of information? That would help us immensely. Thanks!

morrisonlevi commented 1 year ago

@fosron Code Hotspots will automatically work with any span. It doesn't require anything special on the tracer's side to work. So if you have profiling working, and an integration (whether official or custom) then it should work. Edit: well, assuming its duration is long enough :)

fosron commented 1 year ago

@morrisonlevi ah, duration. That might be key here. Got it.

PROFeNoM commented 1 year ago

Hey there 👋

We're happy to announce that our next release (0.87.0) will provide an integration for Laravel queues. By adding support for Laravel queues, we're not only providing observability into Laravel's built-in queue system, but also into Laravel Horizon.

💭 As you may know, Laravel Horizon is a supervisor process for queue workers that is built on top of Laravel's queue system. It manages the worker processes that are using Laravel queues under the hood. Therefore, by integrating with Laravel queues, we'll also be able to provide visibility into Horizon's supervision.

🔍 With this integration, you'll have access to insights into your Laravel queue processes in a distributed context, including information about the jobs being processed, as well as any errors that may have occurred. This will allow you to easily monitor and debug your queue processes, and ensure that they're running smoothly.

We're excited to bring this integration to you and look forward to hearing your feedback. If you have any questions or concerns, please don't hesitate to reach out to us 😃

morrisonlevi commented 1 year ago

Just chiming in to add that v0.87.0 is out now!

There is also a new config mode in the datadog-setup.php script, which allows you to set service, env, version, etc, through the CLI. I think Laravel users might git a bit more out of this feature than some other users, so I'm sharing it with you:

# Install the tracer (no appsec, no profiling)
php datadog-setup.php --php-bin=php

# Get service and env from the .env file
ddservice=$(awk -F= '$1 ~ /^APP_NAME$/ {print $2}' < .env)
ddenv=$(awk -F= '$1 ~ /^APP_ENV$/ {print $2}' < .env)

# Grab version from git (or however you do it)
ddversion=$(git rev-parse HEAD)

# Sets service, env, and version through the installer.
# It uses INI syntax.
php datadog-setup.php config set --php-bin=php \
    -ddatadog.service="${ddservice?}" \
    -ddatadog.env="${ddenv?}" \
    -ddatadog.version="${ddversion?}"

Our online docs should be updated within the next week or so to reflect this.

Thanks for the patience on the Laravel queue worker support. We're excited to hear your feedback!

nikalemdzievski commented 1 year ago

Hey @morrisonlevi, @PROFeNoM thank you for this amazing update! Is this something that is soon expected to come to the ECS Datadog agent docker image (latest tag)? Or if you could point me out which of the RC tags contains this update so I can check it out? Super excited. Thanks again!

pierotibou commented 1 year ago

Hello @nikalemdzievski, the laravel integration is provided by the tracer, not the agent. So the component you should have to update is the tracer library, which is usually installed in the application docker file. You should use: v0.87.0. I hope that helps.

gnumoksha commented 1 year ago
"${ddversion?}"

@morrisonlevi what is ? used for?

bwoebi commented 1 year ago

@gnumoksha From man bash:

       When not performing substring expansion, using the forms documented below (e.g., :-), bash tests for a parameter that is unset or null.  Omitting the colon results in a test only for a parameter that is unset.

       ${parameter:?word}
              Display Error if Null or Unset.  If parameter is null or unset, the expansion of word (or a message to that effect if word is not present) is written to the standard error and the shell, if it is not interactive, exits.  Otherwise, the value of parameter is substituted.

So feel free to remove it.

runyan-co commented 1 year ago

Thank you for the update on this, @pierotibou!

We implement DataDog across our infrastructure over at ProcessMaker and we make significant use of Laravel Horizon. It's been a very thorny issue for us not being able to have visibility into our job queues, since that leaves out roughly 60% of our app's traces from being recorded.

I hope your team continues to work on this and expand its capabilities, Horizon is a very popular package to use for queues and I have no doubt plenty of other's will greatly benefit as well.

s1rc commented 1 year ago

@PROFeNoM @morrisonlevi is there updated documentation on how to configure for Laravel queues? I've reverted my changes https://github.com/DataDog/dd-trace-php/issues/386#issuecomment-1487303134 and set just DD_TRACE_CLI_ENABLED with the new 0.87.2 build. I am not seeing any queue traces within my defined worker service.

morrisonlevi commented 1 year ago

@s1rc Can you open a new issue and explain a bit more? PHP version, OS and OS version, how you are installing the tracer, the output of the ddtrace section of phpinfo(), and if you have the profiler installed, the output of the section datadog-profiling in phpinfo() as well?

PROFeNoM commented 1 year ago

Are you seeing the queue traces if you set back DD_TRACE_AUTO_FLUSH_ENABLED=1? If this is not the case, using DD_TRACE_DEBUG=1, is there anything that stands out?

s1rc commented 1 year ago

@PROFeNoM I set DD_TRACE_AUTO_FLUSH_ENABLED=1 and I'll see a few queue jobs show up in APM under operation laravel.queue.process but not many compared to the number of jobs being processed.

I can create a new issue as to not keep this closed thread going.

PROFeNoM commented 1 year ago

Yep, let's create an issue about this @s1rc to find the bottom of it. Thanks.