Open jawnsy opened 2 years ago
@mfridman Thanks for linking the related issue; it has a lot of useful background and references. Most of the above proposal implicitly assumes that we control and can configure every trusted proxy in the chain to perform sanitization, so that we can blindly trust a given origin address, but #711 makes the observation that that might not always be possible.
@adam-p Thanks for your excellent and comprehensive write-up: The perils of the “real” client IP. I have not read through the whole thing yet, but wanted to share an observation regarding the section you linked:
If your server is directly connected to the internet, there might be an XFF header or there might not be (depending on whether the client used a proxy). If there is an XFF header, pick the leftmost IP address that is a valid, non-private IPv4 or IPv6 address. If there is no XFF header, use the
RemoteAddr
.
On many corporate networks, it is common for users to have a private (RFC 1918) address for their workstation, with a NAT gateway to access other networks or provide outbound access to the Internet. In these environments, the correct client address is a private address, and using this logic would mean that rate limiting would be applied to a proxy, rather than the end user.
For example, suppose Alice (10.0.10.11) connects through an authenticating API gateway (10.10.10.10) to a reverse proxy (192.168.10.10) to the chi-based service (192.168.5.1). In this case, the leftmost non-private address is nothing, so we'd have no choice but to discard the X-Forwarded-For
header entirely and use the original RemoteAddr
, which would be 192.168.10.10
, the address of the reverse proxy.
I'm also personally unsure whether a forward proxy would add additional complexity to this (e.g. Alice connects through a forward proxy to the API gateway.) In that case, would we have to handle a Via
header, or would those proxies also add an X-Forwarded-For
?
Making the trusted proxy count configurable (in addition to configuring trusted origins) seems like a great idea to me.
On many corporate networks, it is common for users to have a private (RFC 1918) address for their workstation, with a NAT gateway to access other networks or provide outbound access to the Internet. In these environments, the correct client address is a private address, and using this logic would mean that rate limiting would be applied to a proxy, rather than the end user.
First, never use the leftmost-ish value for rate-limiting. But you're generally right -- my anti-private-IP-ism was also pointed out in a Reddit comment, so I added a footnote #4 saying something like, "if you're expecting internal IPs, never mind that bit". I guess I could elevate that.
I'm also personally unsure whether a forward proxy would add additional complexity to this (e.g. Alice connects through a forward proxy to the API gateway.) In that case, would we have to handle a Via header, or would those proxies also add an X-Forwarded-For?
For a rightmost-ish algorithm, this doesn't matter. But yeah, to get the leftmost-ish... I think it's impossible to know. You're back to guessing a) what benign proxies might have done along the way, and b) what may or may not be spoofed by the client or intermediate forward proxies.
I think you probably just have to arbitrarily decide. Like, "If there's an XFF header, use the leftmost; if not, use the Via if it's present". (I would check in that order. Because if you had two forward proxies adding Via headers, the second might clobber the header added by the first. Multiple Via headers are disallowed by RFC 2616.)
But, again, not for security-related uses.
In case this is helpful, I've started a hopefully-reference-implementation of some ways to get the "real" client IP from a request: https://github.com/realclientip/realclientip-go There's still work to do on it, though. Maybe I can finish it in the next week.
I wrote a library to help with extracting the "real" client IP from a request. I intended that it be useful in cases like this, so please let me know if it's suitable (and if it's not we'll figure out how to make it better). https://github.com/realclientip/realclientip-go
Thanks so much for this excellent library! I would like to propose the addition of a more sophisticated realip middleware, perhaps as a separate project, similar to
httprate
.I'd be happy to open a pull request or contribute a repository, but wanted to check with you first. Thanks in advance for your consideration!
Current behavior
The included
realip
middleware is a very simple implementation, and serves as a great example for implementing custom middlewares. However, there are some limitations (by design):It allows spoofing if clients are able to access the chi router directly, bypassing the frontend reverse proxy
If the client connects directly to the router and passes
X-Forwarded-For: 127.0.0.1
, thenchi
will update theRemoteAddr
to127.0.0.1
It allows spoofing if the frontend reverse proxy does not appropriately sanitize (filter out) other headers
If the client connects to a reverse proxy and passes
X-Forwarded-For: 127.0.0.1
andX-Real-IP: 10.0.0.1
, and the reverse proxy is configured to ignore/replaceX-Forwarded-For
, but not configured to filter outX-Real-IP
, then the proxy will pass through:X-Forwarded-For: 100.100.100.100
(the real client address) as well asX-Real-IP: 10.0.0.1
, and due to the search order, the untrustedX-Real-IP
will take precedence over the correctX-Forwarded-For
header.It does not indicate whether the original client request protocol was HTTP or HTTPS
If the client connects to the reverse proxy over HTTPS and the reverse proxy terminates TLS, then it is not possible to tell that the client connected over TLS, since
X-Forwarded-Proto
is not considered by the middleware. This means that the server may use incorrect logic, omitting HTTPS-only headers likeStrict-Transport-Security
, avoiding Secure cookies, etc.It does not preserve the original
RemoteAddr
information for downstream clientsIf clients need to get the original
RemoteAddr
(the address/port of the reverse proxy), there is currently no way to get that information.It does not sanitize headers from the original request, so spoofing may be an issue if the request is proxied to a different service, such as using httputil.NewSingleHostReverseProxy()
The original
http.Request
is not modified, so if users pass to a different service without modification (such as through the built-in reverse proxy handler), then the headers will be propagated to the backing service, which can lead to spoofing attacks downstream of chi.Moreover, if users pass multiple values for a given header (e.g. two
X-Forwarded-For
headers), chi will accept the first header and proxied services may have different logic (e.g. last-value-wins), again leading to a spoofing problem. So it is desirable in this case to coalesce the headers (e.g. accept only the first value, and only propagate the first value to downstream proxied connections)Implementation options
There are two straightforward approaches that can address some of the above limitations, with different tradeoffs:
Opt-in approach: Create a handler that adds updated state to the request context.
The advantage of this is that we completely avoid the need to think about what port number to use if we want to create a
RemoteAddr
that "looks like" the usual10.0.0.1:1234
value. This also ensures that downstream code "knows" how to handle things correctly, since they are effectively "opting in" to the originating client address information that the middleware is adding.The disadvantages of this approach are that: (1) all code needs to be modified to look in the request context for the IP address, or would have incorrect information otherwise; and (2) code has a tight coupling to the middleware, since it needs to know the context key containing the "correct" address information.
Opt-out approach: Create a handler that updates the request in-place, and preserves the old state in the request context.
The advantage of this is that most code will operate without modifications, especially if they do not attempt to parse the
RemoteAddr
(e.g. logging client address information as a string). As long as consumers are not parsing theRemoteAddr
, then the code will behave correctly whether or not the middleware is installed/enabled. Moreover, we can modify thereq.TLS
to set a dummytls.ConnectionState
in order to indicate whether the originating protocol was HTTP or HTTPS.Proposed behavior
I would propose creating a middleware that:
X-Forwarded-For
,X-Real-IP
,Forwarded
,CF-Connecting-IP
,True-Client-IP
, etc.)req.Header.Set("X-Forwarded-For", req.Header.Get("X-Forwarded-For"))
), on an opt-in basis (since this is only required in situations where we're using both chi and a reverse proxy)Open questions
For implementing the trusted origin network check, we will likely need to check whether the source address is contained in a given CIDR range. Go provides the
net.IPNet type
for this purpose, however, Go 1.18 introduces a new type, which has a functionnet/netip.Prefix
to do the same containment check. These types were introduced to provide better performance, so it is preferable to use these if available.Given that chi supports Go versions as old as 1.14, this means we either need to stick to the old API, or use the new one conditionally (providing alternate implementations with build tags). This would also constrain the way that we need to design the middleware.