golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.5k forks source link

net/http/httputil: support RFC 7239 Forwarded header in ReverseProxy #30963

Open AndrewBurian opened 5 years ago

AndrewBurian commented 5 years ago

Revival of #20526 which was frozen due to age

I've been using services that make use of Go's Reverse Proxy, and think there's some substantial improvements in the Forwarded header that were overlooked in the original discussion.

Forwarded includes not just previous client IP, it also has the host and protocol, information that is currently haphazardly implemented in a few other non-standard X headers X-Forwarded-Proto and X-Forwarded-Host most of the time. There's also X-Real-IP competing with X-Forwarded-For, and while much less adopted, is used.

The protocol and host info also tends to not get nicely appended in the other headers as comma separated lists, and instead just clobbered by the most recent. If you're more than one proxy down, the chances of recovering that info is slim to nil, where as Forwarded would be a combined mechanism to ensure its survival.

Forwarded also specifies the format to put IPv6 addresses in, where as this is also somewhat mixed in implementation of X-Forwarded-For and requires a bunch of edge-case decoding.

Considering the RFC provides migration guidelines on how to build a Forwarded header in the presence of an X-Forwarded-For header, I see no reason we couldn't supply both and help push adoption.

I've got code and tests for such a feature on a branch, and would be more than happy to open a PR if there's interest.

dunglas commented 4 years ago

This feature is very needed for https://vulcain.rocks, and I guess for most reverse proxies.

sporkmonger commented 4 years ago

If we're considering implementing the Forwarded header, I'd encourage any test suites to cover edge cases, including violations of the specification. In particular, the usage of double quotes, semicolons, commas, and square brackets have the potential to break parsing. For instance, cases like these might result in rather unexpected behavior:

Forwarded: for=" Forwarded: for=";" Forwarded: for=""" Forwarded: for="";" Forwarded: for="[ Forwarded: for=","

Keeping in mind that this header can be sent by untrusted sources, these could have undesirable upstream effects like obscuring the real client IP or, depending on implementation details, potentially causing the whole header to be rejected or ignored, giving an attacker the ability to bypass upstream (or possibly in-process) security mechanisms or fraud checks. While I'm in favor of implementing support for this header, I believe it should be done with a test suite that includes adversarial cases.

Also the spec allows for optional port numbers even though they aren't commonly seen, as well as underscore-prefixed obfuscation identifiers in place of both the IP and port, and those should probably have tests too.

adam-p commented 2 years ago

Here is a relevant conversation from the Nginx forum a few years ago: https://forum.nginx.org/read.php?29,275880,275880#msg-275880

One of the concerns there that @sporkmonger also mentioned was that the pre-existing Forwarded header value could already be illegal. For example, how do you append to:

Forwarded: for="1.1.1.1, for=2.2.2.2

Note the un-closed double-quote.

My reading of the spec suggests that the whole header is garbage if one thing is garbage. (I wrote a blog post about this, fwiw.)

If the design ends up wanting to validate the existing header value before appending, the only complete RFC 7239 parser I've found is this one: https://github.com/lpinca/forwarded-parse (in JavaScript).

Regarding @AndrewBurian's code, some feedback:

dmolesUC commented 2 years ago

I see Nginx apparently doesn't implement Forwarded natively, but they do have sample code for a config-land implementation, including this rather hilarious validation regex:

^(,[ \t]*)*([!#$%&'*+.^_`|~0-9A-Za-z-]+=([!#$%&'*+.^_`|~0-9A-Za-z-]+|"([\t \x21\x23-\x5B\x5D-\x7E\x80-\xFF]|\\[\t \x21-\x7E\x80-\xFF])*"))?(;([!#$%&'*+.^_`|~0-9A-Za-z-]+=([!#$%&'*+.^_`|~0-9A-Za-z-]+|"([\t \x21\x23-\x5B\x5D-\x7E\x80-\xFF]|\\[\t \x21-\x7E\x80-\xFF])*"))?)*([ \t]*,([ \t]*([!#$%&'*+.^_`|~0-9A-Za-z-]+=([!#$%&'*+.^_`|~0-9A-Za-z-]+|"([\t \x21\x23-\x5B\x5D-\x7E\x80-\xFF]|\\[\t \x21-\x7E\x80-\xFF])*"))?(;([!#$%&'*+.^_`|~0-9A-Za-z-]+=([!#$%&'*+.^_`|~0-9A-Za-z-]+|"([\t \x21\x23-\x5B\x5D-\x7E\x80-\xFF]|\\[\t \x21-\x7E\x80-\xFF])*"))?)*)?)*$

There's some discussion of the validation problem there as well. It looks like their recommended behavior is in line with your reading of the spec, @adam-p -- if any of it's garbage, it's all garbage, and shouldn't be trusted.

Dealing with invalid headers

That humongous regex in map $http_forwarded above matches all syntactically valid Forwarded headers. It ensures that NGINX does not blindly append to a malformed header. Otherwise, an external attacker could send something like:

Forwarded: for=injected;by="

and then NGINX would produce:

Forwarded: for=injected;by=", for=real

Depending on how your upstream server parses such a Forwarded, it may or may not see the for=real element. (Unlike with X-Forwarded-For, it can’t just split on comma, because a comma can occur inside a valid quoted string.)

Something like the JS forwarded-parse implementation would be a lot easier to debug than the regex, though (and IMO more Go-appropriate).

@adam-p can you clarify what the saboteur's accomplishing in your scenario? If I'm understanding correctly, it's something like this:

  1. we're an RFC 7239-compliant proxy.
  2. we're at the end of a chain of trusted proxies whose Forwarded headers we'd like to preserve.
  3. however, those upstream proxies aren't completely RFC 7239-compliant, and while we expect their output to be clean, they might be (for instance) just blindly appending to whatever they get.
  4. if a malicious or broken client sends a malformed Forwarded header, we can't tell where the malformed header ends and the headers from the trusted proxies (the ones we'd like to preserve), begin.

Is that the problem?

If so, my first instinct from an ops perspective is "just make sure you drop Forwarded headers at the edge, and let the rest of the chain take care of itself", but I don't know how practical that is IRL.

dmolesUC commented 2 years ago

And just as a side note -- is it completely bananas to contemplate parsing the header backwards, starting from the end where we expect values to be more trustworthy?

adam-p commented 2 years ago

is it completely bananas to contemplate parsing the header backwards, starting from the end where we expect values to be more trustworthy?

I keep thinking of that too. But I keep coming to the conclusion that it's not really any different than splitting by comma first and parsing the chunks. The end result is that your rightmost stuff ends up usable and the leftmost stuff is still garbage, but I don't think it's any worse than it would have been. (You just need to make sure there won't be any commas in the values added by your trusted proxies.)

Once you're violating the RFC... you can do whatever you want.

Edit: After thinking about it more, my mind is changing about this. There are degrees of RFC-adherence/violation, and the backwards parsing would give more adherence (such as allowing commas in values, as the obvious example).

can you clarify what the saboteur's accomplishing in your scenario?

Your scenario is the main one.

Another is that our reverse proxy might allow different access if the user is connecting directly/internally rather than coming in through a proxy. The saboteur forces us to throw away the header and now our not-great logic makes us think it's a direct-connect.

"just make sure you drop Forwarded headers at the edge, and let the rest of the chain take care of itself"

As more of a dev and less of an ops, I don't love solutions involving "just configure your reverse proxies right". Do all CDNs and WAFs allow for dropping headers? They certainly don't have it all as the default. That means that droves of devs won't read the fine print or will think that nothing bad can ever happen to them, and will be vulnerable.

That regex is amazing.

paulo-raca commented 1 year ago

I've implemented that, here is the code for anyone interested: https://gist.github.com/paulo-raca/f9e7c629326becbfe5c7ad7848db19e9

Maybe some of this might even be merged eventually