DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
611 stars 296 forks source link

Next.JS plugin integration #4003

Open Tarektouati opened 5 months ago

Tarektouati commented 5 months ago

Env :

OS : MAC/Linux
Datadog agent version: ?
dd-trace: v4.26.0
Node:  v20.10.0
React: v18.3.0-canary-d900fadbf-20230929
Next.JS : v14.0.4
Next’s build type: standalone

Hey 👋🏼 !

I’m working on a Next.JS app with app directory, built in standalone mode, and packaged in a docker image to be deployed on K8s cluster. I’ve made multiple attempts to integrate dd-trace next’s plugin but doesn’t seem to be working :

// instrumentation.node.ts

import Tracer from "dd-trace";

const tracer = Tracer.init({
  logInjection: true,
  startupLogs: true,
});

tracer.use("next");
// instrumentation.ts

export async function register() {
  // NEXT_RUNTIME cannot be frozen
  if (process.env.NEXT_RUNTIME === "nodejs") {
    await import("./instrumentation.node");
  }
}

I do see some traces popping on DD APM UI, but only see methods like GET | POST but no path or route information.

Once I continued digging these traces, it seems that they were created from http plugin instead of next one.

We ended up patching the dd-trace dependency (http plugin) to have something working :

diff --git a/packages/datadog-plugin-http/src/client.js b/packages/datadog-plugin-http/src/client.js
index 42833bb896f64e5cbf37840f4a4087a346715aa5..dc0c552c6dafa297c80ebd77179f1a21accf51a7 100644
--- a/packages/datadog-plugin-http/src/client.js
+++ b/packages/datadog-plugin-http/src/client.js
@@ -42,7 +42,7 @@ class HttpClientPlugin extends ClientPlugin {
         [COMPONENT]: this.constructor.id,
         'span.kind': 'client',
         'service.name': this.serviceName({ pluginConfig: this.config, sessionDetails: extractSessionDetails(options) }),
-        'resource.name': method,
+        'resource.name': `${method} ${uri}`,
         'span.type': 'http',
         'http.method': method,
         'http.url': uri,
diff --git a/packages/datadog-plugin-http/src/server.js b/packages/datadog-plugin-http/src/server.js
index dcf4614819efec27f59a979f360d44c98c0ca4f2..cbc380936e31e4961f7bbee70925245dffaec88d 100644
--- a/packages/datadog-plugin-http/src/server.js
+++ b/packages/datadog-plugin-http/src/server.js
@@ -33,7 +33,11 @@ class HttpServerPlugin extends ServerPlugin {
       res,
       this.operationName()
     )
+    const url = new URL(req.url)
+
     span.setTag(COMPONENT, this.constructor.id)
+    span.setTag('resource.name', `${req.method} ${url.pathname}`)
+

     this._parentStore = store
     this.enter(span, { ...store, req, res })
@@ -63,6 +67,9 @@ class HttpServerPlugin extends ServerPlugin {
       incomingHttpRequestEnd.publish({ req, res: context.res })
     }

+
+    web.setRoute(req, req.url)
+
     web.finishAll(context)
   }

Am I missing something in my configuration ?

Sh031224 commented 5 months ago
import { registerOTel } from '@vercel/otel';

export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { TracerProvider } = (await import('dd-trace')).default.init({
      logInjection: true,
      startupLogs: true,
    });

    const provider = new TracerProvider();

    registerOTel();
    provider.register();
  }
};

Would you like to try this?

Lisenish commented 5 months ago

@Sh031224 Didn't work for me, I also tried to do registerOTel before the init but it also didn't help. Did it work for you?

Sh031224 commented 5 months ago

@Lisenish The important thing is to transfer otel data to datadog using the provider.

If you only use datadog, it seems that you cannot fully use the spans provided by next.js.

Lisenish commented 4 months ago

@Sh031224 Oh, sorry for the late reply 🙇 Actually I was able to see it after my message here, so yeah it seems this approach works.

We still needed to group the resource.name on our own, though, since by default it doesn't group anything, just records each individual URL as a separate resource (to e.g. /items/1, items/2 are separate resources).

tracer.use('http', {
    hooks: {
      request(span, req) {
        if (span && req) {
          const urlString = 'path' in req ? req.path : req.url;

          if (urlString) {
            const url = new URL(urlString, 'http://localhost');
            const path = url.pathname + url.search;
            const resourceGroup = getPathGroup(url.pathname); // our custom function to generilize the url
            const method = req.method;

            span.setTag('resource.name', method ? `${method} ${resourceGroup}` : resourceGroup);
            span.setTag('http.route', method ? `${method} ${path}` : path);
          }

It also creates a lot of weird operations (in addition to web.request) based on the request unique URL, e.g. operation GET items_342223, we decided not to do anything about it for now

jonluca commented 4 months ago
import { registerOTel } from '@vercel/otel';

export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { TracerProvider } = (await import('dd-trace')).default.init({
      logInjection: true,
      startupLogs: true,
    });

    const provider = new TracerProvider();

    registerOTel();
    provider.register();
  }
};

Would you like to try this?

This almost works - I get an exception on the datadog Tracer implementation


2024-03-01T07:19:50.992697065Z stderr F TypeError: parentTracer.getSpanLimits is not a function

2024-03-01T07:19:50.99269989Z stderr F     at new Span (/app/node_modules/@prisma/instrumentation/node_modules/@opentelemetry/sdk-trace-base/build/src/Span.js:59:41)

2024-03-01T07:19:50.992702455Z stderr F     at /app/node_modules/@prisma/instrumentation/dist/chunk-VVAFFO6L.js:59:20

2024-03-01T07:19:50.992704769Z stderr F     at Array.forEach (<anonymous>)

2024-03-01T07:19:50.992707324Z stderr F     at ActiveTracingHelper.createEngineSpan (/app/node_modules/@prisma/instrumentation/dist/chunk-VVAFFO6L.js:44:27)

2024-03-01T07:19:50.992709588Z stderr F     at Xi.createEngineSpan (/app/node_modules/@prisma/client/runtime/library.js:123:1645)

2024-03-01T07:19:50.992716832Z stderr F     at vt.logger (/app/node_modules/@prisma/client/runtime/library.js:113:1167)

2024-03-01T07:19:50.992719607Z stderr F     at /app/node_modules/@prisma/client/runtime/library.js:113:922

That I'm able to get around by monkey patching the provider

import { registerOTel } from "@vercel/otel";

export async function register() {
  try {
    if (process.env.NEXT_RUNTIME === "nodejs") {
      console.log("Registering tracing");
      process.env.WEIGHTS_SERVICE = "weights-nextjs-serverless";

      const tracer = await import("~/tracing");
      const { PrismaInstrumentation } = await import("@prisma/instrumentation");

      const provider = new tracer.TracerProvider();
      const baseTracer = provider.getTracer.bind(provider);
      provider.getTracer = (name: string, version?: string) => {
        const newTracer = baseTracer(name, version);
        // @ts-ignore
        newTracer.getSpanLimits = () => ({});
        return newTracer;
      };

      registerOTel({
        serviceName: "weights-nextjs-serverless",
        instrumentations: ["auto", new PrismaInstrumentation()],
      });

      // Register the provider globally
      provider.register();
    }
  } catch (e) {
    console.error(e);
  }
}

But then I get an exception with the startSpan method

Registering tracing
TypeError: Cannot read properties of undefined (reading '_traceId')
    at Tracer.startSpan (/var/task/node_modules/dd-trace/packages/dd-trace/src/opentelemetry/tracer.js:38:25)
    at Tracer.startActiveSpan (/var/task/node_modules/dd-trace/packages/dd-trace/src/opentelemetry/tracer.js:112:23)
    at /var/task/node_modules/next/dist/server/lib/trace/tracer.js:122:103
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at Za.with (file:///var/task/node_modules/@vercel/otel/dist/node/index.js:20:16621)
    at ContextAPI.with (/var/task/node_modules/@opentelemetry/api/build/src/api/context.js:60:46)
    at NextTracerImpl.trace (/var/task/node_modules/next/dist/server/lib/trace/tracer.js:122:28)
    at /var/task/node_modules/next/dist/compiled/next-server/server.runtime.prod.js:16:3795
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at Za.with (file:///var/task/node_modules/@vercel/otel/dist/node/index.js:20:16621)
Error: Runtime exited without providing a reason
Runtime.ExitError
radum commented 2 months ago

Hello everyone, I managed to hit the same dead end like most of you here. I am running Next.js 14 with app router.

The only way I managed to get it working (although not sure if it is fully working yet) is to create a JS file server-preload.js

const packageJSON = require('../package.json');

function setUpDatadogTracing() {
    const tracer = require('dd-trace');

    tracer.init({
        runtimeMetrics: true,
        logInjection: true,
        env: 'dev',
        service: `myapp`,
        version: packageJSON?.version ?? 'unknown'
    });
}

setUpDatadogTracing();

And load it within package.json node -r server-preload.js ./node_modules/.bin/next start. Doing this I don't get only GET and POST in Resources and I have GET /_not-found for 404 pages and GET /about etc etc based on the pages I have.

I am also getting the versioning coming through for each new release I make and also the dev envs are set properly.

Logs are ingested also but only the ones that I am logging via an internal logger I made via Pino. The other ones are not coming in as they are not in JSON format.

There is a way in the file above to patch the console log and make it spit out JSON but that is a can of worms because there is lots of cleaning up that needs to be done to make it work and also it could break at any Next update.

Using the instrumentation hook I never managed to get it working, and using the telemetry from Vercel plus DD I always got undefined errors looking for the _traceID in an object.

Even with this setup I am not sure if I can see any spans and I need to check more.

For sourcemaps I am thinking to generate them and load them via the CI before I remove them from the deployed app.

Has anyone found a better way that works with most DD features and can share their setup?

Tarektouati commented 2 months ago

@radum your solution seem to inspired by this blog post https://jake.tl/notes/2021-04-04-nextjs-preload-hack I've already tried this solution, and it works fine.

Following the root issue, I want to enable Datadog log injection with next.js without preloading any script.

I want to manage it directly from instrumentation.ts|js which are designed for https://nextjs.org/docs/app/building-your-application/optimizing/instrumentation

radum commented 2 months ago

@Tarektouati I found that article while looking for log ingestion but yeah that one helped validate the fact that doing it via instrumentation is never going to work :)

I would like to use the instrumentation hooks but DD is just not working with that or the fact that the hook is still experimental means it has all kinds of issues we don't see.

olafurns7 commented 3 days ago

This is a incredibly large issue considering Next.js is the largest web framework today.

We are heavily relying on server components and no variations of the setups above works correctly.