18F / api.data.gov

A hosted, shared-service that provides an API key, analytics, and proxy solution for government web services.
https://api.data.gov
Other
96 stars 43 forks source link

Disable X-Fowarded-Host parsing for incoming request routing #355

Closed GUI closed 7 years ago

GUI commented 7 years ago

This came up based on an API user's request for developer.nrel.gov. After analyzing the past week's worth of log files, I'm pretty sure this shouldn't have any adverse impact, but I wanted to document this change here in any case.

When we receive an API request, one of the first things we have to do for routing the request is determine which host the request belongs to (so we can distinguish between requests for developer.nrel.gov, api.nasa.gov, etc). Currently the code for that logic like this:

ngx.ctx.host = ngx_var.http_x_forwarded_host or ngx_var.http_host or ngx_var.host

So basically, we take whichever option is set first in this order:

  1. X-Forwarded-Host HTTP header
  2. Host HTTP header
  3. The server's name if a request without a Host header or empty Host header comes in

All valid requests will set a Host header, so that's the most commonly matched mechanism by far. The only case X-Forwarded-Host would be set is if there's another proxy sitting in front of api.data.gov that manipulates the Host header in some way. However, in these cases, it doesn't seem like matching the forwarded host is actually the most appropriate behavior, since then we're trying to match against some arbitrary host our system has likely never seen before. Instead, matching on the Host header seems more appropriate even in cases where there is another proxy in front of api.data.gov. We'd probably only want to match the forwarded host in special cases where we're the ones in control of the proxy sitting in front of api.data.gov (but based on traffic logs, we don't have any of those special cases).

So the change we're looking to make is removing X-Forwarded-Host from what will be matched, and using Host. So the logic simply becomes:

ngx.ctx.host = ngx_var.http_host or ngx_var.host

The reason for this change is to make it easier to put another proxy in front of an api.data.gov service in certain circumstances. I don't expect these circumstances to be very common, but since this seems like more correct behavior anyway, making the change seems like a good idea.

The specific case that triggered this was a user that needs to proxy to an NREL API for corporate policy and network reasons. They were also using Apache 2.2's mod_proxy to perform that proxy. Apache 2.2's mod_proxy forces the X-Forwarded-Host to be set, which was then causing 404 not found errors for the proxied API requests, since our servers were then trying to match the requests based on the private domain used by the company (eg, we were trying to match private-proxy.apis.example.com instead of developer.nrel.gov). Apache 2.2 has no way to disable this X-Forwarded-Host header (it's only possible in Apache 2.3.10+ since it's hardcoded in 2.2), so there wasn't a particularly easy solution on the client's side without upgrading Apache or using a different proxy server.

Based on 1 week's worth of raw logs, the X-Forwarded-Host header is not set on most of our requests. In the legitimate cases where it is set, the Host header has the exact same value, so those requests would be handled identically. This means I don't anticipate this very tiny code change having any impact (other than allowing these type of external proxies to be setup more easily in some cases).

GUI commented 7 years ago

Fixed by https://github.com/NREL/api-umbrella/commit/28a6582da11550a9c572f54b1571fe1d9f8cdb40 Now the X-Forwarded-Host headers we receive will be ignored in favor of the typical Host headers.