DataDog / dd-trace-js

JavaScript APM Tracer
https://docs.datadoghq.com/tracing/
Other
639 stars 302 forks source link

Use `req.host` instead of `req.headers['host']` on express #1080

Open patrick91 opened 4 years ago

patrick91 commented 4 years ago

Hi, I was trying to improve our tracing to show the correct hostname, we run lambdas under API Gateway with a Cloudfront proxy in front of it, so the host in the header is not the right one, since the right one is in the X-Forwarded-Host header.

You can allow express to trust the X-Forwarded-Host header, see: https://expressjs.com/en/guide/behind-proxies.html

This will set the correct host on req.host, but looks like this is not being used by dd-trace-js[1]. Is there any reason for that?

[1] https://github.com/DataDog/dd-trace-js/blob/master/packages/dd-trace/src/plugins/util/web.js#L415

patrick91 commented 4 years ago

If anyone has the same issue, here's the workaround I'm using:

import tracer from 'dd-trace';

// from https://github.com/DataDog/dd-trace-js/blob/86c57e0eaba9c83f6067752f83af3d25ec578e3b/packages/dd-trace/src/plugins/util/web.js#L408
// we changed `req.headers['host']` with `req.host`

function extractURL(req) {
  const headers = req.headers;

  if (req.stream) {
    return `${headers[HTTP2_HEADER_SCHEME]}://${headers[HTTP2_HEADER_AUTHORITY]}${headers[HTTP2_HEADER_PATH]}`;
  } else {
    const protocol = req.connection.encrypted ? 'https' : 'http';
    return `${protocol}://${req.host}${req.originalUrl || req.url}`;
  }
}

tracer.init();
tracer.use('express', {
  hooks: {
    request: (span, req, res) => {
      const url = extractURL(req);

      span.setTag('http.url', url.split('?')[0]);
    },
  },
});

export default tracer;
rochdev commented 4 years ago

The reasoning was that web.js is used by all web frameworks including Express. Since req.host is set by Express, using it would only work for Express. I think we should instead replicate the Express logic to properly handle the different headers in every framework, assuming this is the expected behaviour.

patrick91 commented 4 years ago

The reasoning was that web.js is used by all web frameworks including Express. Since req.host is set by Express, using it would only work for Express. I think we should instead replicate the Express logic to properly handle the different headers in every framework, assuming this is the expected behaviour.

That makes total sense. But I wonder about replicating what express does, as it is something that depends on a configuration. What about using req.host if it exists and falling back on the headers if not?

Not sure what other framework do 🤔

rochdev commented 4 years ago

Do you think it would make sense to add a configuration on the tracer as well? It definitely makes sense to use req.host in any case though. I didn't realize Express had a configuration for it which the tracer should definitely pick up.

patrick91 commented 4 years ago

Do you think it would make sense to add a configuration on the tracer as well? I'd try to find a good way to use what the framework provides us, so dd-trace-js doesn't do anything too complex

It definitely makes sense to use req.host in any case though. I didn't realize Express had a configuration for it which the tracer should definitely pick up.

I'm checking also other frameworks:

fastify

uses hostname: https://www.fastify.io/docs/latest/Request/#request also configurable to trust proxies: https://www.fastify.io/docs/latest/Server/#trustproxy

koa

from the docs:

request.host

Get host (hostname:port) when present. Supports X-Forwarded-Host when app.proxy is true, otherwise Host is used.

request.hostname

Get hostname when present. Supports X-Forwarded-Host when app.proxy is true, otherwise Host is used.

If host is IPv6, Koa delegates parsing to WHATWG URL API, Note This may impact performance.

hapi

I wasn't really able to find much, but it seems they do things a bit differently: https://stackoverflow.com/questions/31840286/how-to-get-the-full-url-for-a-request-in-hapi

so maybe we should pass a function to fetch the host/url to the web util, so we can customise the behaviour per framework?

rochdev commented 4 years ago

so maybe we should pass a function to fetch the host/url to the web util, so we can customise the behaviour per framework?

Right, it really looks like each one has their own config and their own way to support the header. Then I agree that the best approach would be by framework since it would mean we can support the configuration out of the box.

patrick91 commented 4 years ago

I can attempt a PR for express :)

Not sure where I could do this though, maybe I can pass a function in normalize config, here: https://github.com/DataDog/dd-trace-js/blob/8c5d352bcac169f4bda03830566ffbbd29456791/packages/datadog-plugin-express/src/index.js#L7

this function would be used to get the correct host. The other place I see is instrument, but doesn't look like it is the best place for something like that.

rochdev commented 4 years ago

I would say the easiest is probably to add a setHost on web.js that plugins can use to set the actual host and that would default to the Host header (current behaviour).

patrick91 commented 4 years ago

@rochdev it is probably faster to ask you, where would I put the host? I think I need to call setHost before instrumenting the request, so I can't do req._datadog.host = xxx, do you have any suggestion?

rochdev commented 4 years ago

@patrick91 Can you clarify why you can't use req._datadog? Where are you currently calling setHost? Maybe the config can be stored somewhere so that setHost can be called later?

patrick91 commented 4 years ago

if I understood correctly req._datadog is set in .patch, which is called in instrument function, here's a very WIP PR I've done:

https://github.com/DataDog/dd-trace-js/pull/1082/files#diff-0b84b4276780e2d2c5b9e1c34dc86b78R57

I guess I could create _datadog when needed, but doesn't feel right

rochdev commented 4 years ago

It's possible to call patch directly before calling instrument, even multiple times if needed. This was added explicitly for this use case of adding _datadog before instrumenting.

patrick91 commented 4 years ago

@rochdev thanks! I'm having some troubles with req not being the modified request from express, not sure what's happening, I'll need to spend a bit of time on it, as I'm unfamiliar with node and express, maybe I'll ask around for some help 😊