caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.99k stars 4.03k forks source link

Add support for Forwarded header (RFC 7239) #3262

Open 0az opened 4 years ago

0az commented 4 years ago

RFC 7239 specifies a Forwarded header, which is intended to replace X-Forwarded-*.

It would be great if Caddy supported this.

Ref:

mholt commented 4 years ago

Thanks for the request.

What is your use case for this?

The Go team tried to drill down on this a few years ago but the answers were either missing or unsatisfactory: https://github.com/golang/go/issues/20526

0az commented 4 years ago

Thanks for the quick response.

Honestly, it's more of a "nice to have", with a use case of replacing the quasi-standardized X-Forwarded-For/-Host/-Proto/-Scheme. I can afford to do this in my environment (Caddy in front of a small collection of services).


Some research on the topic:

I found @c960657's more comprehensive https://c960657.github.io/forwarded.html immediately after I finished making the list below. 😅

Relevant issues/tickets:

francislavoie commented 4 years ago

There's a couple go libs that do the parsing:

They're both over 4 years old though.

mholt commented 4 years ago

Now this is an excellent feature request. :+1: Not because it's a high priority, but because it's well-researched!

francislavoie commented 3 years ago

I don't think there's any plans to implement this at the moment, so I'll mark this "deferred" and close it for now. If there becomes a compelling reason to implement it later (like more of the server applications ecosystem start using it) then we can reopen it. Thanks @0az for the research on the topic!

Tronic commented 1 year ago

It works with Nginx but while looking to switch to Caddy I found no way to implement it. My Sanic services make use of it. There are clear benefits but "no one uses it" remains a chicken-egg problem that can only be fixed by more server implementing it as an option. The de-facto standards are messy in what gets included, whether the original or local "host" header is sent as is, and with some using x-real-ip while others use x-forwarded-for and some append to x-forwarded-for and set x-real-ip with the most recent client IP.

I couldn't even find a custom header definition to do this in Caddy. Is that possible somehow? Ideally such that existing proxy headers are kept but don't need to be converted.

I always use the "secret" parameter suggested in Nginx docs, which prevents spoofing of proxy headers, even when the full chain of proxies are kept. https://www.nginx.com/resources/wiki/start/topics/examples/forwarded/

Parsing the header is best done last-to-first header (if there is more than one), right-to-left, such that any injections attempted by client cannot mess up the later records added by trusted proxies. The only difficulty being that values may be quoted, and within quotes , does not separate records, and thus an unclosed quote would mess up the parsing of the rest. Doing it in reverse all the valid records can be read, stopping the parsing at a syntax error and ignoring that record and anything to its left side.

mholt commented 1 year ago

Now that we're about 3 years into Caddy v2, and things are settling down a little, I'd be open to reviewing a PR to implement this. (Just don't have high hopes of it being a one-and-done dealio)

Tronic commented 1 year ago

Almost correct with this config. Apparently IPv6 remote_host would need to be bracketed but I don't know how to do that with Caddy yet. Also, would prefer a shorthand config to remove any x-forwarded headers and add this one.

example.com {
        reverse_proxy 127.0.0.1:3000 {
        header_up +Forwarded by="_{system.hostname}";for="{remote_host}";host="{hostport}";proto={scheme};secret=TrustNo1
        header_up -X-Forwarded-*
    }
}

EDIT: Don't use this code! Fixed version in https://github.com/caddyserver/caddy/issues/3262#issuecomment-1633527119

francislavoie commented 1 year ago

I always use the "secret" parameter suggested in Nginx docs, which prevents spoofing of proxy headers, even when the full chain of proxies are kept. nginx.com/resources/wiki/start/topics/examples/forwarded

Those aren't the Nginx docs, it's a community-maintained wiki article. An Nginx maintainer called the wiki article "questionable" in https://trac.nginx.org/nginx/ticket/1316. So I'm not inclined to use that as a valid source of information about implementing this.

Parsing the header is best done last-to-first header (if there is more than one), right-to-left, such that any injections attempted by client cannot mess up the later records added by trusted proxies. The only difficulty being that values may be quoted, and within quotes , does not separate records, and thus an unclosed quote would mess up the parsing of the rest. Doing it in reverse all the valid records can be read, stopping the parsing at a syntax error and ignoring that record and anything to its left side.

I'm not interested in implementing a parser for these header values, it's obviously non-trivial to get it correct, especially because it's a security sensitive header.

There's also the concern that we'd need to configure a toggle/mode to choose whether X-Forwarded-* headers are used or if Forwarded is used, and the way that would interact if either or both are provided by downstream is undefined/unclear.

And it would also mean changes at the HTTP-server level now that we have trusted_proxies and client_ip_headers config for first-party client IP parsing.

There's just too many unknowns here, and the design of the RFC is way too complex for little benefit since the amount of users who might use it is very small compared to X-Forwarded-* headers.

There are clear benefits but "no one uses it" remains a chicken-egg problem that can only be fixed by more server implementing it as an option.

I'm aware and I agree it's not ideal, but it's a balance on the amount of complexity we're willing to take on as maintainers. This RFC is not simple. It was already a lot of effort just to get our support of X-Forwarded-* just right (I think we have it right now, anyway) and taking it even further would require many more changes.

Apparently IPv6 remote_host would need to be bracketed but I don't know how to do that with Caddy yet.

We could probably add a placeholder which does that. PRs welcome.

mholt commented 1 year ago

Thanks Francis. Those are good points too. Hm...

Tronic commented 1 year ago

Thanks for the quick and thoughtful replies.

Caddy is indeed in a deep mess in trying to parse downstream headers. There would be no ambiguity though, because each type of header should be disabled by default and only manually enabled by config settings (real-ip header name, N for the number of trusted proxies in chain in a private network, or setting the Forwarded secret); still cannot quite trust any other X-Forwarded-* headers and while trusted proxy CIDRs may be the only option in some setups, it comes with its own problems.

Node package forwarded-http attempts autodetection, and is easily spoofed. E.g. a service that runs on Rackspace and expects x-cluster-client-ip is spoofed by any client that adds cf-connecting-ip (Cloudflare) to its request.

Caddy might have a particular issue when running as a middle proxy with non-TLS downstream connections, setting X-Forwarded-Proto: http and losing the true protocol, while Host and X-Forwarded-Host may come from anywhere and then the backend server uses these to construct external URLs...

The benefit of Forwarded is that each forwarded-element (of key-value pairs) is from a single proxy, and should form a consistent and more easily secured record. Proxies can also easily wipe any previous Forwarded headers and construct their own, if that is preferred rather than a full chain. Erasing all the non-standard headers is much more difficult.

One could identify the record of interest by its by value if a secret identifier was used there, but a separate secret makes this more explicit. The non-standard field comes from that Nginx wiki article but the RFC allows for such extensions.

Forwarded: by="caddy";for="2001::...";host="example.com";proto=https;secret=TrustNo1

This is from the above snippet and by little modification I could instead of {remote_host} use one of those legacy headers from my downstream proxy and not have Caddy handle it. Even with my limited Go skills I could easily implement another placeholder like remote_host_bracketed for this but what to do when the IP comes in X-Forwarded-For? Which apparently sometimes come with brackets for IPv6 but usually not so. Regex replacement for custom variables perhaps? I'll try my hand at that PR if you can help finding a reasonable angle to approach it.

Tronic commented 1 year ago

Indeed, no downstream uses Forwarded, so parsing it is a lesser concern, and if Caddy can somehow produce correct the client IP of the mess we've got then simply adding a bracketed placeholder like remote_hostbracketed would be sufficient (may also need to strip brackets from any incoming headers used on the existing remote_host if it doesn't already do so).

mholt commented 1 year ago

Reopening for discussion; I don't have time to get in on this right now but anyone else is welcome to continue the discussion :)

Tronic commented 1 year ago

Solved by config hackery.

(forwarded) {
    map {remote_host} {for} {
        ~^[0-9.]+$          for=${0}      # IPv4 client address
        ~^[0-9A-Fa-f:.]+$   for="[${0}]"  # IPv6 bracketed and quoted
        default             for=unknown   # Unix socket
    }
    vars forwarded by="_{system.hostname}";{for};host="{hostport}";proto={scheme}
}

example.com {
    import forwarded
    reverse_proxy :3000 {
        header_up +Forwarded {vars.forwarded}
        header_up -X-Forwarded-*
    }
}

The code is ugly but it works and is standards compliant (assuming no double quotes or other weird characters in system hostname or host header).

I am now matching the correct forwarded-element simply by proxy hostname in by, which it turns out is also supported in Sanic and probably covers most proxy setups with no risk of spoofing (while again parsing right to left).

mholt commented 1 year ago

That's great, thanks for sharing. (When this issue was opened, that solution wasn't possible yet.)

I imagine that has some limitations compared to the total complexity of how Forwarded works, but probably works for quite a few use cases :smiley:

francislavoie commented 1 year ago

That's pretty clever! I think you could probably augment it by using {client_ip} (using the 2.7 beta version) instead of {remote_host} if you configured trusted_proxies in global options, which would make it use the parsed XFF header value from downstream. If you have something in front of Caddy anyway.

Also I think you might want to remove the + from +Forwarded because that will add the header instead of setting it, which means if downstream set Forwarded it would have two of the same header instead of ignoring the original header (spoofing risk).

Tronic commented 1 year ago

complexity of Forwarded

I believe this is complete for sending them upstream as per option 1 below.

client_ip

A good idea for option 2... Actually I thought remote_host was already returning, well, remote IPs. I haven't read the docs or development discussions that far yet 😅

+Forwarded

The choice to append is intentional; you are supposed to have a chain of forwarded-elements from each proxy (separated by commas or in multiple headers). Given how buggy HTTP servers are at handling duplicate request headers, it might however be safer to concatenate them all on a single header.

Say you have Client -> CDN -> Caddy -> Server. The server receives:

Forwarded: by=_cdn; for=<Client IP>; proto=https; host=<Host header from Client>
Forwarded: by=_caddy; for=<CDN IP>; proto=<CDN to Caddy>; host=<Host from CDN>

Proxies can simply append their own local information to the end. In practice I can very well see an alternative where proxies strip off any earlier headers and try to forward information from downstream that they trust, i.e. instead the server only gets:

Forwarded: by=_caddy; for=<Client IP>; proto=https; host=<Host header from Client>

Whether that information comes from XFF, Forwarded or some other means is then up to Caddy (needs manual configuration for trust). But the specification seems to intend for the first option, and that is clearly easier for proxies.

Spoofing risk

A server that knows its proxies and is not buggy cannot be spoofed. It can either count from the end the Nth element (configured proxies count - works great on XFF), or preferably seek for by=_cdn (configured identifier). Even if Mallory knew the identifier, she couldn't exploit it unless she can also bypass the CDN to connect Caddy directly (mitigated by trusted proxies configuration).

Buggy servers remain a large practical risk though. A server that only sees the first header appears to work in normal use but can be spoofed. The same goes for those who don't implement the parsing and searching backwards as they should. And the worst still just read the first value from XFF by default (trivial to exploit when they don't actually have any proxies).