DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
656 stars 309 forks source link

Next.JS plugin integration #4003

Open Tarektouati opened 10 months ago

Tarektouati commented 10 months ago

Env :

OS : MAC/Linux
Datadog agent version: ?
dd-trace: v4.26.0
Node:  v20.10.0
React: v18.3.0-canary-d900fadbf-20230929
Next.JS : v14.0.4
Next’s build type: standalone

Hey 👋🏼 !

I’m working on a Next.JS app with app directory, built in standalone mode, and packaged in a docker image to be deployed on K8s cluster. I’ve made multiple attempts to integrate dd-trace next’s plugin but doesn’t seem to be working :

// instrumentation.node.ts

import Tracer from "dd-trace";

const tracer = Tracer.init({
  logInjection: true,
  startupLogs: true,
});

tracer.use("next");
// instrumentation.ts

export async function register() {
  // NEXT_RUNTIME cannot be frozen
  if (process.env.NEXT_RUNTIME === "nodejs") {
    await import("./instrumentation.node");
  }
}

I do see some traces popping on DD APM UI, but only see methods like GET | POST but no path or route information.

Once I continued digging these traces, it seems that they were created from http plugin instead of next one.

We ended up patching the dd-trace dependency (http plugin) to have something working :

diff --git a/packages/datadog-plugin-http/src/client.js b/packages/datadog-plugin-http/src/client.js
index 42833bb896f64e5cbf37840f4a4087a346715aa5..dc0c552c6dafa297c80ebd77179f1a21accf51a7 100644
--- a/packages/datadog-plugin-http/src/client.js
+++ b/packages/datadog-plugin-http/src/client.js
@@ -42,7 +42,7 @@ class HttpClientPlugin extends ClientPlugin {
         [COMPONENT]: this.constructor.id,
         'span.kind': 'client',
         'service.name': this.serviceName({ pluginConfig: this.config, sessionDetails: extractSessionDetails(options) }),
-        'resource.name': method,
+        'resource.name': `${method} ${uri}`,
         'span.type': 'http',
         'http.method': method,
         'http.url': uri,
diff --git a/packages/datadog-plugin-http/src/server.js b/packages/datadog-plugin-http/src/server.js
index dcf4614819efec27f59a979f360d44c98c0ca4f2..cbc380936e31e4961f7bbee70925245dffaec88d 100644
--- a/packages/datadog-plugin-http/src/server.js
+++ b/packages/datadog-plugin-http/src/server.js
@@ -33,7 +33,11 @@ class HttpServerPlugin extends ServerPlugin {
       res,
       this.operationName()
     )
+    const url = new URL(req.url)
+
     span.setTag(COMPONENT, this.constructor.id)
+    span.setTag('resource.name', `${req.method} ${url.pathname}`)
+

     this._parentStore = store
     this.enter(span, { ...store, req, res })
@@ -63,6 +67,9 @@ class HttpServerPlugin extends ServerPlugin {
       incomingHttpRequestEnd.publish({ req, res: context.res })
     }

+
+    web.setRoute(req, req.url)
+
     web.finishAll(context)
   }

Am I missing something in my configuration ?

Sh031224 commented 9 months ago
import { registerOTel } from '@vercel/otel';

export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { TracerProvider } = (await import('dd-trace')).default.init({
      logInjection: true,
      startupLogs: true,
    });

    const provider = new TracerProvider();

    registerOTel();
    provider.register();
  }
};

Would you like to try this?

Lisenish commented 9 months ago

@Sh031224 Didn't work for me, I also tried to do registerOTel before the init but it also didn't help. Did it work for you?

Sh031224 commented 9 months ago

@Lisenish The important thing is to transfer otel data to datadog using the provider.

If you only use datadog, it seems that you cannot fully use the spans provided by next.js.

Lisenish commented 9 months ago

@Sh031224 Oh, sorry for the late reply 🙇 Actually I was able to see it after my message here, so yeah it seems this approach works.

We still needed to group the resource.name on our own, though, since by default it doesn't group anything, just records each individual URL as a separate resource (to e.g. /items/1, items/2 are separate resources).

tracer.use('http', {
    hooks: {
      request(span, req) {
        if (span && req) {
          const urlString = 'path' in req ? req.path : req.url;

          if (urlString) {
            const url = new URL(urlString, 'http://localhost');
            const path = url.pathname + url.search;
            const resourceGroup = getPathGroup(url.pathname); // our custom function to generilize the url
            const method = req.method;

            span.setTag('resource.name', method ? `${method} ${resourceGroup}` : resourceGroup);
            span.setTag('http.route', method ? `${method} ${path}` : path);
          }

It also creates a lot of weird operations (in addition to web.request) based on the request unique URL, e.g. operation GET items_342223, we decided not to do anything about it for now

jonluca commented 8 months ago
import { registerOTel } from '@vercel/otel';

export const register = async () => {
  if (process.env.NEXT_RUNTIME === 'nodejs') {
    const { TracerProvider } = (await import('dd-trace')).default.init({
      logInjection: true,
      startupLogs: true,
    });

    const provider = new TracerProvider();

    registerOTel();
    provider.register();
  }
};

Would you like to try this?

This almost works - I get an exception on the datadog Tracer implementation


2024-03-01T07:19:50.992697065Z stderr F TypeError: parentTracer.getSpanLimits is not a function

2024-03-01T07:19:50.99269989Z stderr F     at new Span (/app/node_modules/@prisma/instrumentation/node_modules/@opentelemetry/sdk-trace-base/build/src/Span.js:59:41)

2024-03-01T07:19:50.992702455Z stderr F     at /app/node_modules/@prisma/instrumentation/dist/chunk-VVAFFO6L.js:59:20

2024-03-01T07:19:50.992704769Z stderr F     at Array.forEach (<anonymous>)

2024-03-01T07:19:50.992707324Z stderr F     at ActiveTracingHelper.createEngineSpan (/app/node_modules/@prisma/instrumentation/dist/chunk-VVAFFO6L.js:44:27)

2024-03-01T07:19:50.992709588Z stderr F     at Xi.createEngineSpan (/app/node_modules/@prisma/client/runtime/library.js:123:1645)

2024-03-01T07:19:50.992716832Z stderr F     at vt.logger (/app/node_modules/@prisma/client/runtime/library.js:113:1167)

2024-03-01T07:19:50.992719607Z stderr F     at /app/node_modules/@prisma/client/runtime/library.js:113:922

That I'm able to get around by monkey patching the provider

import { registerOTel } from "@vercel/otel";

export async function register() {
  try {
    if (process.env.NEXT_RUNTIME === "nodejs") {
      console.log("Registering tracing");
      process.env.WEIGHTS_SERVICE = "weights-nextjs-serverless";

      const tracer = await import("~/tracing");
      const { PrismaInstrumentation } = await import("@prisma/instrumentation");

      const provider = new tracer.TracerProvider();
      const baseTracer = provider.getTracer.bind(provider);
      provider.getTracer = (name: string, version?: string) => {
        const newTracer = baseTracer(name, version);
        // @ts-ignore
        newTracer.getSpanLimits = () => ({});
        return newTracer;
      };

      registerOTel({
        serviceName: "weights-nextjs-serverless",
        instrumentations: ["auto", new PrismaInstrumentation()],
      });

      // Register the provider globally
      provider.register();
    }
  } catch (e) {
    console.error(e);
  }
}

But then I get an exception with the startSpan method

Registering tracing
TypeError: Cannot read properties of undefined (reading '_traceId')
    at Tracer.startSpan (/var/task/node_modules/dd-trace/packages/dd-trace/src/opentelemetry/tracer.js:38:25)
    at Tracer.startActiveSpan (/var/task/node_modules/dd-trace/packages/dd-trace/src/opentelemetry/tracer.js:112:23)
    at /var/task/node_modules/next/dist/server/lib/trace/tracer.js:122:103
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at Za.with (file:///var/task/node_modules/@vercel/otel/dist/node/index.js:20:16621)
    at ContextAPI.with (/var/task/node_modules/@opentelemetry/api/build/src/api/context.js:60:46)
    at NextTracerImpl.trace (/var/task/node_modules/next/dist/server/lib/trace/tracer.js:122:28)
    at /var/task/node_modules/next/dist/compiled/next-server/server.runtime.prod.js:16:3795
    at AsyncLocalStorage.run (node:async_hooks:346:14)
    at Za.with (file:///var/task/node_modules/@vercel/otel/dist/node/index.js:20:16621)
Error: Runtime exited without providing a reason
Runtime.ExitError
radum commented 7 months ago

Hello everyone, I managed to hit the same dead end like most of you here. I am running Next.js 14 with app router.

The only way I managed to get it working (although not sure if it is fully working yet) is to create a JS file server-preload.js

const packageJSON = require('../package.json');

function setUpDatadogTracing() {
    const tracer = require('dd-trace');

    tracer.init({
        runtimeMetrics: true,
        logInjection: true,
        env: 'dev',
        service: `myapp`,
        version: packageJSON?.version ?? 'unknown'
    });
}

setUpDatadogTracing();

And load it within package.json node -r server-preload.js ./node_modules/.bin/next start. Doing this I don't get only GET and POST in Resources and I have GET /_not-found for 404 pages and GET /about etc etc based on the pages I have.

I am also getting the versioning coming through for each new release I make and also the dev envs are set properly.

Logs are ingested also but only the ones that I am logging via an internal logger I made via Pino. The other ones are not coming in as they are not in JSON format.

There is a way in the file above to patch the console log and make it spit out JSON but that is a can of worms because there is lots of cleaning up that needs to be done to make it work and also it could break at any Next update.

Using the instrumentation hook I never managed to get it working, and using the telemetry from Vercel plus DD I always got undefined errors looking for the _traceID in an object.

Even with this setup I am not sure if I can see any spans and I need to check more.

For sourcemaps I am thinking to generate them and load them via the CI before I remove them from the deployed app.

Has anyone found a better way that works with most DD features and can share their setup?

Tarektouati commented 7 months ago

@radum your solution seem to inspired by this blog post https://jake.tl/notes/2021-04-04-nextjs-preload-hack I've already tried this solution, and it works fine.

Following the root issue, I want to enable Datadog log injection with next.js without preloading any script.

I want to manage it directly from instrumentation.ts|js which are designed for https://nextjs.org/docs/app/building-your-application/optimizing/instrumentation

radum commented 7 months ago

@Tarektouati I found that article while looking for log ingestion but yeah that one helped validate the fact that doing it via instrumentation is never going to work :)

I would like to use the instrumentation hooks but DD is just not working with that or the fact that the hook is still experimental means it has all kinds of issues we don't see.

olafurns7 commented 4 months ago

This is a incredibly large issue considering Next.js is the largest web framework today.

We are heavily relying on server components and no variations of the setups above works correctly.

wlechowicz commented 4 months ago

I'm in the same boat as everyone in this thread, Next.js 14 + App Router + RSC, set up dd-trace through instrumentation, enabled OTEL etc. only to get my app to die with HTTP 500 on incoming requests because of TypeError: Cannot read properties of undefined (reading '_traceId')

Preloading script and other hacks like monkey-patching console - no, thank you. If Sentry can add itself into instrumentation in an elegant manner, so should dd-trace. My solution for now is opting out of DataDog for Next.js 14+ apps until there is a sane way of doing this.

neilkumar-circle commented 3 months ago

This seems to work for me.

However, the graphql plugin will not load with a production build next build.

The development build works as expected, I'm not sure what I'm doing wrong:

instrumentation.ts:

export async function register() {
  if (
    process.env.NEXT_RUNTIME === 'nodejs' &&
    process.env.ENV &&
    process.env.SERVICE_NAME
  ) {
    const ddTrace = await import('dd-trace')

    const tracer = ddTrace.default.init({
      env: process.env.ENV,
      service: process.env.SERVICE_NAME,
      version: process.env.SERVICE_VERSION,
      sampleRate: 1,
      profiling: true,
      runtimeMetrics: true,
      logInjection: true,
      dogstatsd: {
        hostname: 'localhost',
        port: 8125,
      },
    })

    // Monitor GraphQL
    tracer.use('graphql', {
      enabled: true,
      measured: true,
    })

    // Monitor Next.js
    tracer.use('next', {
      enabled: true,
      measured: true,
    })

    // Monitor Winston Logger
    tracer.use('winston', {
      enabled: true,
    })

    const provider = new tracer.TracerProvider()

    provider.register()
  }
}

graphql integration is not loading with nextjs production build

"integrations_loaded":["fetch","winston","http","net","child_process"]

neilkumar-circle commented 3 months ago

In addition to the above, I have solved this with updating my start script: "start": "node -r dd-trace node_modules/.bin/next start",

This is obviously a workaround, I'm not sure if it's a prod bundler issue which is leading to dd-trace not being imported as early as it is in the dev build?

tlhunter commented 3 months ago

Hey everyone, here's our documentation on how to use Datadog with Next.js: https://docs.datadoghq.com/tracing/trace_collection/compatibility/nodejs/#complex-framework-usage

The approach in the blog post by Jake appears to be the same as what we suggest. You'll need to use the --require ... flag or the NODE_OPTIONS='--require ... environment variable to instruct Node.js to load the tracer before Next.js is loaded.

I wouldn't necessarily refer to this as a "workaround" or a "hack". The --require flag should be pretty stable in Node.js so this solution shouldn't stop working one day. The problem with the Next.js "instrumentation.js|ts" approach (I'm assuming as I haven't dug through their code yet) is that by the point in time that Next.js executes that file, Next.js has already required/imported a bunch of files used by itself, and potentially subdependencies that also need to be instrumented. That approach is incompatible with the dd-trace approach where it needs to be loaded prior to those other modules/libraries being required/imported.

That said, it sounds like a competing APM tool is able to work by using the instrumentation.js file, so there may be a way to support such an approach. Please create a helpdesk feature request (should be available from the GitHub new issue screen) as this will help prioritize such a feature.

neilkumar-circle commented 3 months ago

The approach in the blog post by Jake appears to be the same as what we suggest. You'll need to use the --require ... flag or the NODE_OPTIONS='--require ... environment variable to instruct Node.js to load the tracer before Next.js is loaded.

@tlhunter thanks for the reply, the way I have it set up is: running node -r dd-trace ... to preload the whole package and then using the .config(...) method to configure the tracer within the nextjs instrumentation hook.

I think the docs refer to requiring initialize (node --require dd-trace/initialize). Would using dd-trace/initialize mean that we would have to configure the tracer with ENV variables or could we still use the approach that I outlined above?

tlhunter commented 3 months ago

@neilkumar-circle you should be fine using -r (alias for --require) which points to your own local module which configures and calls the dd-trace init() programmatically.

The dd-trace/initialize file in the tracer package is just a convenience to load the tracer using default configuration and which depends on env vars for config.

radum commented 3 months ago

@tlhunter I think the docs need to better highlight the steps one needs to take for frameworks. Digging that highlight is a huge pain. But thank you for sharing.

So if I have in my package.json node -r ./server-preload.js ./node_modules/.bin/next start and the server-preload.js does the init like this:

function setUpDatadogTracing() {
    const tracer = require('dd-trace');
    console.log('Setting up Datadog tracing');

    tracer.init({
        runtimeMetrics: true,
        logInjection: true,
        profiling: true,
    });
    tracer.use('next');
}

setUpDatadogTracing();

Is the same as using dd-trace/initialize?

tlhunter commented 3 months ago

@radum the dd-trace/initialize file does some additional work as well, such as assisting with ESM loading. It's a bit of a multi-purpose helper file.

If your application does not already use -r dd-trace/initialize, then yes you should simply be able to use your -r ./server-preload.js solution. Your file will correctly initialize and configure the tracer early enough that it will work with Next.js.

If your application does already depend on using -r dd-trace/initialize then it wouldn't be equivalent to replace the -r flag with a different file like your -r ./server-preload.js approach as it would leave out some of the ESM niceties.

/cc @bengl who contributed the most to that part of the tracer.

radum commented 3 months ago

Thank you @tlhunter Can you explain what are those ESM niceties in more details for us to understand if its worth switching?

meyer9 commented 3 months ago

For those who host on Vercel:

I was able to sort of workaround this by sending OpenTelemetry traces to a separate server running dd-agent w/ an API key protected endpoint for submitting traces. This allows collecting traces even on Vercel hosted apps.

This can be done just by setting the env vars: OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS (with trace endpoint authorization headers)

johnford2002 commented 3 months ago

@meyer9, could you please elaborate a bit on your setup? I've been trying to get our Vercel-hosted app to properly forward traces, and I haven't been having much luck.

What does your Vercel configuration look like? Are you using the experimental telemetry hook? Are you importing and using the @vercel/otel package?

separate server running dd-agent w/ an API key protected endpoint

Do you mean a separate API key that is specifically used to auth to this server rather than the DD API key used to send traces? If so, what did you use to set that up?

OTEL_EXPORTER_OTLP_HEADERS (with trace endpoint authorization headers)

I'm guessing this is where you're specifying that API key that you're sending along. Again, curious if it's separate from the DD API key.

albertorodriguez-ballys commented 2 months ago

our architecture is frontend app using nextjs framework with some components in client side and some using server. this app is calling to an API. we want to trace calls done to that api, both calls done from either client components and server components.

so far, we have managed to instrument datadog in client side calls by using datadog-rum script along with allowedTracingUrls flag set to the API we want to trace, following https://docs.datadoghq.com/real_user_monitoring/guide/monitor-your-nextjs-app-with-rum/?tab=cdnasync

Problem we have: we are trying to instrument dd-trace with the different approaches posted above, but calls made from server side components are not working. for exact same API call, we see it traced correctly in DD APM Explorer if done from client components, but not if done from server side.

Approaches we tried:

  1. node -r dd-trace/init node_modules/.bin/next dev
  2. node -r ./server-preload.js node_modules/.bin/next dev
  3. instrumentation.ts with:

    const ddTrace = await import('dd-trace')

    const tracer = ddTrace.default.init({ service: 'nextjs-app', env: 'develop', version: '1.0.0', sampleRate: 1, profiling: true, runtimeMetrics: true, logInjection: true, })

    // Monitor Next.js tracer.use('next', { enabled: true, measured: true, })

    any help? Should dd-trace be installed in API's server too?

olafurns7 commented 1 week ago

Hey everyone, here's our documentation on how to use Datadog with Next.js: https://docs.datadoghq.com/tracing/trace_collection/compatibility/nodejs/#complex-framework-usage

The approach in the blog post by Jake appears to be the same as what we suggest. You'll need to use the --require ... flag or the NODE_OPTIONS='--require ... environment variable to instruct Node.js to load the tracer before Next.js is loaded.

I wouldn't necessarily refer to this as a "workaround" or a "hack". The --require flag should be pretty stable in Node.js so this solution shouldn't stop working one day. The problem with the Next.js "instrumentation.js|ts" approach (I'm assuming as I haven't dug through their code yet) is that by the point in time that Next.js executes that file, Next.js has already required/imported a bunch of files used by itself, and potentially subdependencies that also need to be instrumented. That approach is incompatible with the dd-trace approach where it needs to be loaded prior to those other modules/libraries being required/imported.

That said, it sounds like a competing APM tool is able to work by using the instrumentation.js file, so there may be a way to support such an approach. Please create a helpdesk feature request (should be available from the GitHub new issue screen) as this will help prioritize such a feature.

This is quite absurd for a company the size of DataDog to have such bare minimum examples of how to set up tracing compared to some of your competitors. This also requires having a additional step in CI to install dd-trace as it will not be copies over with the required compiled node_modules for a next.js project, as next will not detect the usage of dd-trace.

wlechowicz commented 1 week ago

@olafurns7 likely you'd need experimental.outputFileTracingIncludes in next.config pointing to a module that imports dd-trace, especially if you're using the standalone output mode.

meyer9 commented 1 week ago

@meyer9, could you please elaborate a bit on your setup? I've been trying to get our Vercel-hosted app to properly forward traces, and I haven't been having much luck.

What does your Vercel configuration look like? Are you using the experimental telemetry hook? Are you importing and using the @vercel/otel package?

separate server running dd-agent w/ an API key protected endpoint

Do you mean a separate API key that is specifically used to auth to this server rather than the DD API key used to send traces? If so, what did you use to set that up?

OTEL_EXPORTER_OTLP_HEADERS (with trace endpoint authorization headers)

I'm guessing this is where you're specifying that API key that you're sending along. Again, curious if it's separate from the DD API key.

Sorry for the very late reply. I think I was using the otel package and passing in otel env vars to Vercel. The Otel env vars pointed to a separate server that I ran with a password protected ingestion endpoint for Otel traces. I setup a separate API key (not datadog API key) for this just to protect the endpoint.

AlexBurkey commented 6 days ago

I'm having a similar issue. The plugins seem to sort of work when on version 14.2.6 of next but when I upgrade to 14.2.11 they break and the endpoints are missing from the server-side traces. I think this is likely due to some breaking change as a result of the testing being turned off

tlhunter commented 4 days ago

@AlexBurkey that's a compat issue which is fixed in https://github.com/DataDog/dd-trace-js/pull/4916

gpremnat commented 2 days ago

I need to log my client and server logs from nextJS app to data-dog. I was using winston before and was getting module not found 'fs' everytime i ran code on client. So i switched to pino. But still i get same error but from datadog.

⨯ ./node_modules/@datadog/native-iast-rewriter/js/source-map/index.js:3:1
Module not found: Can't resolve 'fs'

Can someone help how this can be integrated in my nextjs to avoid issues

tlhunter commented 2 hours ago

@gpremnat it seems like you're loading the tracer in browser code. Unfortunately Next.js can make it tricky to know when code loads in one place or another.