DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
647 stars 306 forks source link

Express.js override resource name in wildcard route #477

Closed joaovieira closed 4 years ago

joaovieira commented 5 years ago

I've just setup dd-trace in an Express.js application. I've immediately noticed that routes with wildcard get the same resource name. I.e.

// server.js
app.use('/api', apiRouter);

// apiRouter.js
router.get('/something/:id', ...); // resource name = 'GET /api/something/:id'
router.get('*', ...); // resource name = 'GET /api/' regardless of actual url

I've tried to override this behaviour by setting the http.route tag on the currently active span as mentioned in the custom tagging docs and seen on the tests. However that does not seem to work, the resource name in the DD APM UI is still the same. E.g.:

router.get('*', (_, res) => {
  tracer.scope().active().setTag('http.route', req.originalUrl);
  res.json({ ok: true});
});

// or ideally a middleware:

router.get('*', (_, _, next) => {
  tracer.scope().active().setTag('http.route', req.originalUrl);
  next();
}, (_, res) => {
  res.json({ ok: true });
}):

I've also tried setting resource.name directly, without any result.

How can I do this? I'd like to get separate resources depending on the actual path, not the express route.


Similar to https://github.com/DataDog/dd-trace-js/issues/350 in terms of expectations.

rochdev commented 5 years ago

This is definitely a bug. Other users have reported similar issues which appear to be caused by a recent regression.

In the meantime, you had the right idea of a temporary workaround. However, the span that you are accessing with tracer.scope().active() is actually the span of the middleware and not the request.

You can hook on the request span directly with this configuration:

tracer.init()
tracer.use('express', {
  hooks: {
    request: (span, req, res) => {
      span.setTag('http.route', req.originalUrl)
    }
  }
})

However, keep in mind that the value set on http.route must be low cardinality, so it won't work properly if you have many different URLs, for example /user/123, /user/456 and so on because these should all be grouped under /user/:id.

joaovieira commented 5 years ago

Thanks for the quick reply.

Yes, I know. This will only apply for the few non-dynamic routes we have (that are actually handled by another middleware). All assets are loaded from a CDN, for example. I.e.:

router.get('*', (req, res) => {
  thirdParty.handleRequest(req, res, ...);
});

I thought about the hooks (didn't try yet), though:

Will give that a go anyway 👍

rochdev commented 5 years ago

This can be done, although it's a bit of a hack. Since the hook runs at the end of the request, you can add metadata to the request in any middleware and it will be available in the hook.

So you could do something like this:

tracer.init()
tracer.use('express', {
  hooks: {
    request: (span, req, res) => {
      if (req._customRoute) {
        span.setTag('http.route', req.originalUrl)
      }
    }
  }
})

router.get('*', (req, res) => {
  req._customRoute = req.originalUrl
})
rochdev commented 5 years ago

Static assets have come up a lot, I'm open to suggestions on how to improve this.

joaovieira commented 5 years ago

Good idea, don't think it's that much of a hack. Though I'm happy with this for my case:

hooks: {
  request: (span, req: Request) => {
    if (req.route.path.includes('*')) {
      // remove query string, fragment and trailing slash
      span.setTag('http.route', url.parse(req.originalUrl).pathname.replace(/\/$/, ''));
    }
  },
},
screen shot 2019-03-06 at 22 14 35

Thanks!

rochdev commented 5 years ago

Glad you could get it to work! I'll still leave this opened as we should support static assets out of the box.

geekflyer commented 5 years ago

I'm running into the same issue as described by @joaovieira in his first post after upgrading from dd-trace 0.7.1 to 0.11.0 . Setting the resource name in the middleware itself via a tag used to work in 0.7.1 but doesn't work anymore in 0.11.0 and now health check requests are spamming my trace list (we used to ignore health check requests via the DD_IGNORE_RESOURCE: 'GET /status,GET /ping agent config but this doesn't work anymore due to the incorrect resource name). I would appreciate if this could be fixed soon as this prevents us from upgrading dd-trace.

rochdev commented 5 years ago

@geekflyer Can you share a snippet of how you set the resource name from the middleware? Did you try with a hook similar to the solution that worked to solve the original issue? For 0.11.0 definitely check the release notes since how to use hooks is now a bit different.

geekflyer commented 5 years ago

Hi,

this is how we used to set the resource name in 0.7.1 inside the middleware:

import * as tracer from 'dd-trace';

export function healthCheckMiddleware (req: SolvvyRequest, res: Response, next: NextFunction) {
    const activeTraceScope = tracer.scopeManager().active();
    if (req.method === 'GET' && req.path === '/ping') {
      if (activeTraceScope) {
        activeTraceScope.span().setTag('http.route', '/ping');
      }
      res.sendStatus(204);
    } else if (req.method === 'GET' && req.path === '/status') {
      if (activeTraceScope) {
        activeTraceScope.span().setTag('http.route', '/status');
      }
     ...misc_logic...
   }
   else {
    next();
  }
}  

This is what we set after upgrading to 0.11.0. We had to adapt the code a bit to make use ofter newer dd-trace API and I also attempted setting the sampling priority to USER_REJECTED but neither of those tags seems to take effect.

import tracer from 'dd-trace';
import { tags, priority } from 'dd-trace/ext';

export function healthCheckMiddleware (req: SolvvyRequest, res: Response, next: NextFunction) {
      const activeSpan = tracer.scope().active();
    if (req.method === 'GET' && req.path === '/ping') {
      if (activeSpan) {
        activeSpan.setTag(tags.SAMPLING_PRIORITY, priority.USER_REJECT);
        activeSpan.setTag('http.route', '/ping');
      }
      res.sendStatus(204);
    } else if (req.method === 'GET' && req.path === '/status') {
      if (activeSpan) {
        activeSpan.setTag(tags.SAMPLING_PRIORITY, priority.USER_REJECT);
        activeSpan.setTag('http.route', '/status');
      }
     ...misc_logic...
   }
   else {
    next();
  }
}  

Just fyi the manually set http.route tag shows up in the UI on the express.middleware span but not on the express.request span and does not affect the name of the resource as such.

geekflyer commented 5 years ago

I didn't try using hooks yet because that would require a bunch of extra work. The healthCheckMiddleware above is part of a reuse library that is used in 15 micro-services. Setting the hooks via the reuse library is not really possible since dd-trace has to be initialized incl. it's hooks before anything else (incl. the reuse library) is imported and setting the hooks in each micro-service separately would be extremely un-DRY. I think the engineer who creates a middleware knows himself best how to instrument and tag requests flowing through the middleware and therefore it would be kind of bad separation of concerns if the library consumer has to set middleware specific hooks in a central place outside that middleware function.

rochdev commented 5 years ago

@geekflyer This is indeed an interesting use case. I will try to think of a way to allow configuring hooks at any point in time, but in the meantime, you will probably have to rely on a hack at this point since such functionality is not supported.

If you are willing to temporarily use a hack, the request span will be available on the span context in the middleware like so: tracer.scope().active().context()._trace.started[0].

Then we could work out the best approach to handle this properly for 0.12.0. I'm open to suggestions if you have a specific API in mind.

rochdev commented 5 years ago

@geekflyer One other way to do it if you don't care at all about stats for these endpoints is to just blacklist them. You will still have to repeat this config in every service but at least it's simple configuration and not code.

For example:

tracer.use('express', {
  blacklist: ['/ping', '/status']
})
geekflyer commented 5 years ago

Ok, I will try to use the blacklisting and see if that helps.

Regarding the problem in general: I don't think I need something as sophisticated as configuring hooks at any point in time and I also imagine this would get complicated to get right if there are multiple places in an app configuring different or the same hooks and ensuring they configure things in the right order etc.

The core of my issue is mostly that there are seemingly certain "global" or "group" tags which are by default inferred from a specific span (e.g. in this case apparently express.request) even though they actually affect the treatment of other spans in that group as well.

E.g. it seems to me express.request and express.middleware are somehow "grouped" together into a resource but the the resource name and sampling priority of that group is only inferred from express.requests tags. What it'd need is the ability to set such "group" tags via a specific API and being able to set them from the span-context of each span in that group and not just from the first one.

Just in case I got that "grouping" wrong and it's rather a parent child relationship: What it'd need is the ability to set certain tags that affect the entire tree not only from the parent span context (express.request) but also from the child span context (express.middleware).

On that note: I think it's currently not very well documented / clear and what's actually a "resource", how resource names are constructed and which kind of spans are folded together into the same resource.

For example https://docs.datadoghq.com/tracing/visualization/#resources mentions only.

APM automatically assigns names to your resources; however you can also name them explicitly. See instructions for: Go, Java, Python, Ruby.

Neither those central docs nor the dd-trace instrumentation libraries' docs say anything about the ruleset of how a resource name is "automatically" determined nor does it say how to set a resource name explcitly manually.

rochdev commented 5 years ago

I don't think I need something as sophisticated as configuring hooks at any point in time and I also imagine this would get complicated to get right if there are multiple places in an app configuring different or the same hooks and ensuring they configure things in the right order etc.

I agree. Hooks are usually used to work around a missing feature and are not considered a permanent solution to most problems.

The core of my issue is mostly that there are seemingly certain "global" or "group" tags which are by default inferred from a specific span (e.g. in this case apparently express.request) even though they actually affect the treatment of other spans in that group as well.

In general there are 3 levels of "global" data on a trace:

1) Component level. This metadata is attached to the root of a component within a service. For example express.request for express. 2) Service level. This metadata is attached to the root of a local trace (single process). 3) Global level. This metadata is attached once on the first service of a trace and is distributed to all other services so the information is available to all spans of a distributed trace.

E.g. it seems to me express.request and express.middleware are somehow "grouped" together into a resource but the the resource name and sampling priority of that group is only inferred from express.requests tags.

What you are referring to is usually called a "component". In our platform we currently combine this directly with the service. The resource is unrelated to this and will usually be something like a URL or a SQL query.

What it'd need is the ability to set such "group" tags via a specific API and being able to set them from the span-context of each span in that group and not just from the first one.

While I agree this would indeed be useful, in your case I think you are looking to set a tag on either the global level or service level, and not just for a component.

Just in case I got that "grouping" wrong and it's rather a parent child relationship: What it'd need is the ability to set certain tags that affect the entire tree not only from the parent span context (express.request) but also from the child span context (express.middleware).

This is technically already possible with sampling priority for example. However, in order to make sure that the same priority is used for every spans within a trace, we have to lock it down as soon as a span finishes or that a span context is propagated to another service. What is happening in your case is that a middleware from within Express finishes before your middleware is run, which causes the sampling priority to be locked.

I think it's currently not very well documented / clear and what's actually a "resource", how resource names are constructed and which kind of spans are folded together into the same resource.

I agree that our documentation could use some work to better explain the different concepts. We are actively working to not only document these better, but are also to make them easier in general and using the proper semantics, especially around components.

I hope these were useful explanations. I'll have to think of a way to better handle metadata at the different levels. The main issue is propagation. For example if the first span of a trace is flushed to the agent, then if you change the metadata on a later span it will not apply to that first span since it was already submitted. Since right now we are not sending spans right away, it would be possible to address this issue by relying on the trace being finished instead of the span, but this has many other implications.

I'll update this thread when I have a better answer for you about an actual solution to the issue.

aivopaas commented 5 years ago

I'm running into similar issue of not being able to create Facets (or search) of any of the tags added in some Koa middleware. From reading the source I understand that I could do that by abusing the internal ctx.req._datadog.span property.

It would be good to have direct access to the root span from any middleware. In Koa it could be something like ctx.datadog.root() and in Express req.datadog.root().

Currently I don't see any value of using something like that unless it adds to the root span and gets analyzed.

const span = tracer.scope().active();
span.addTags({
    some: 'custom',
    values: 'here',
});
rochdev commented 5 years ago

@aivopaas For your specific use case, can you use the span hooks as described above?

About having a way to access the root span, I'll have to think about it because the problem is that the root may change in the future. We constantly improve traces by updating and adding spans, so which span is the root shouldn't be relied upon. Maybe if you could specify the specific component you want to access it would work? For example tracer.scope().active().root('koa') would give you the root span for koa specifically.

aivopaas commented 5 years ago

@aivopaas For your specific use case, can you use the span hooks as described above?

We could use the span hooks, but I saw that it passes in req and res, not Koa ctx. We'd like to use some context data from ctx so the hooks are not the best option unless we move our data onto ctx.req, which we would like to avoid when working with Koa. And as I understand the hooks are also kind of workaround and might change or be removed in future?

All I really care about the "root" we'd get access to is that it would be the same one that gets analysed in DataDog. Since for Koa it is the koa.request it would make sense to somehow expose it on ctx and same goes for Express req.

rochdev commented 5 years ago

We are not planning to remove span hooks since they enable doing many things that are not otherwise possible. We do want however to handle most cases in simpler ways whenever possible.

I'll bring the idea of a root per component with tracer.scope().active().root('koa') or a similar API with the team since I think it could make sense for all our tracers.

In the meantime, I would say to use the hooks if you can. Since you want to keep your context data on ctx, one option would be to attach ctx itself to req.ctx to make it available in the hook.

rochdev commented 5 years ago

@geekflyer Setting sampling priority from a middleware will work in 0.11.1 when this change lands: https://github.com/DataDog/dd-trace-js/pull/544

We are still researching the best way to provide something similar for updating the route.

marciorodrigues87 commented 5 years ago

Hi guys, did anybody fix this problem? I'm using "dd-trace": "0.13.1" and couldn't get it work even with the hooks approach or with tracer.scope().active().context()._trace.started[0]...

hook code:

tracer.use('express', {
  hooks: {
    request: (span, req, res) => {
      const host = req.hostname || 'unknownHostname'
      const name = req.path.split('/')[1]
      span.setTag('http.route', `${host}/${name}`)
    }
  }
})

hack code:

  app.get('(/:path(*))?', (req, res, next) => {
    const span = tracer.scope().active()
    if (span) {
      const parent = span.context()._trace.started[0]
      if (parent) {
        const host = req.hostname || 'unknownHostname'
        const name = req.path.split('/')[1]
        parent.setTag('http.route', `${host}/${name}`)
      }
    }
    ... code here
  })
rochdev commented 5 years ago

@marciorodrigues87 What resource name do you get instead of what you are setting? Also, is the hook registered right after initializing the tracer?

marciorodrigues87 commented 5 years ago

Hi @rochdev I'm getting the default (/:path(*))? as resource name... There are some require statements before the hook registration, putting it right in the next line after the tracer initialization seems to work 😄

rochdev commented 4 years ago

The logic for route detection has changed a lot since this issue was opened. If anyone is still having this issue, please feel free to re-open.

villanuevadani commented 1 year ago

Do you know whether this issue has been resolved in v4? We still have issues in Angular 15 with SSR

faguerre commented 8 months ago

Im currently having a similar error even when I tried the previous solutions. This is part of my code:

const app = express()
const port = 5000;
app.use(bodyParser.json())
app.use(await router.Routes());

ddtrace.init({
    service: 'app-task',
    hostname: process.env.ENV == "dev" ? "localhost" : process.env.DD_AGENT_HOST,
    port: 8125,
    env: process.env.ENV == "dev" ? "dev" : process.env.DD_ENV,
    logInjection: true,
    plugins: false,
})
ddtrace.use('winston');

function normalizePath(req) {
    const { path: pathRequest, method } = req;
    const path = pathRequest ? pathRequest.split(/[?#]/)[0] : '/';
    return `${method} ${path.replace(/\/\d+(?=\/|$)/g, '/:id')}`;
}

ddtrace.use('express', {
    middleware: false,
});
ddtrace.use('http', {
    service: 'app-task',
    hooks: {
      request: (span, req) => {
        span.setTag('http.route', `${req.method} ${normalizePath(req)}`);
      },
    },
});

Unfortunately, my app is still not showing in DataDog. I don't receive any metrics. Note that im using a router for the endpoints

Any suggest? Thanks