Add more sophisticated realip middleware

jawnsy commented 2 years ago

Thanks so much for this excellent library! I would like to propose the addition of a more sophisticated realip middleware, perhaps as a separate project, similar to httprate.

I'd be happy to open a pull request or contribute a repository, but wanted to check with you first. Thanks in advance for your consideration!

Current behavior

The included realip middleware is a very simple implementation, and serves as a great example for implementing custom middlewares. However, there are some limitations (by design):

It allows spoofing if clients are able to access the chi router directly, bypassing the frontend reverse proxy

If the client connects directly to the router and passes X-Forwarded-For: 127.0.0.1, then chi will update the RemoteAddr to 127.0.0.1
It allows spoofing if the frontend reverse proxy does not appropriately sanitize (filter out) other headers

If the client connects to a reverse proxy and passes X-Forwarded-For: 127.0.0.1 and X-Real-IP: 10.0.0.1, and the reverse proxy is configured to ignore/replace X-Forwarded-For, but not configured to filter out X-Real-IP, then the proxy will pass through: X-Forwarded-For: 100.100.100.100 (the real client address) as well as X-Real-IP: 10.0.0.1, and due to the search order, the untrusted X-Real-IP will take precedence over the correct X-Forwarded-For header.
It does not indicate whether the original client request protocol was HTTP or HTTPS

If the client connects to the reverse proxy over HTTPS and the reverse proxy terminates TLS, then it is not possible to tell that the client connected over TLS, since X-Forwarded-Proto is not considered by the middleware. This means that the server may use incorrect logic, omitting HTTPS-only headers like Strict-Transport-Security, avoiding Secure cookies, etc.
It does not preserve the original RemoteAddr information for downstream clients

If clients need to get the original RemoteAddr (the address/port of the reverse proxy), there is currently no way to get that information.
It does not sanitize headers from the original request, so spoofing may be an issue if the request is proxied to a different service, such as using httputil.NewSingleHostReverseProxy()

The original http.Request is not modified, so if users pass to a different service without modification (such as through the built-in reverse proxy handler), then the headers will be propagated to the backing service, which can lead to spoofing attacks downstream of chi.

Moreover, if users pass multiple values for a given header (e.g. two X-Forwarded-For headers), chi will accept the first header and proxied services may have different logic (e.g. last-value-wins), again leading to a spoofing problem. So it is desirable in this case to coalesce the headers (e.g. accept only the first value, and only propagate the first value to downstream proxied connections)

Implementation options

There are two straightforward approaches that can address some of the above limitations, with different tradeoffs:

Opt-in approach: Create a handler that adds updated state to the request context.

The advantage of this is that we completely avoid the need to think about what port number to use if we want to create a RemoteAddr that "looks like" the usual 10.0.0.1:1234 value. This also ensures that downstream code "knows" how to handle things correctly, since they are effectively "opting in" to the originating client address information that the middleware is adding.

The disadvantages of this approach are that: (1) all code needs to be modified to look in the request context for the IP address, or would have incorrect information otherwise; and (2) code has a tight coupling to the middleware, since it needs to know the context key containing the "correct" address information.
Opt-out approach: Create a handler that updates the request in-place, and preserves the old state in the request context.

The advantage of this is that most code will operate without modifications, especially if they do not attempt to parse the RemoteAddr (e.g. logging client address information as a string). As long as consumers are not parsing the RemoteAddr, then the code will behave correctly whether or not the middleware is installed/enabled. Moreover, we can modify the req.TLS to set a dummy tls.ConnectionState in order to indicate whether the originating protocol was HTTP or HTTPS.

Proposed behavior

I would propose creating a middleware that:

Accepts configuration for a list of trusted headers (which may be a predefined set, such as X-Forwarded-For, X-Real-IP, Forwarded, CF-Connecting-IP, True-Client-IP, etc.)
Accepts configuration for a list of trusted origin networks (specified as CIDRs corresponding to the source addresses of the reverse proxy; in case the headers are supplied from untrusted networks, the headers will be silently dropped to prevent spoofing)
Performs either the opt-out or opt-in approach
Provides a utility function to sanitize & coalesce headers (e.g. req.Header.Set("X-Forwarded-For", req.Header.Get("X-Forwarded-For"))), on an opt-in basis (since this is only required in situations where we're using both chi and a reverse proxy)

Open questions

For implementing the trusted origin network check, we will likely need to check whether the source address is contained in a given CIDR range. Go provides the net.IPNet type for this purpose, however, Go 1.18 introduces a new type, which has a function net/netip.Prefix to do the same containment check. These types were introduced to provide better performance, so it is preferable to use these if available.

Given that chi supports Go versions as old as 1.14, this means we either need to stick to the old API, or use the new one conditionally (providing alternate implementations with build tags). This would also constrain the way that we need to design the middleware.

jawnsy commented 2 years ago

@mfridman Thanks for linking the related issue; it has a lot of useful background and references. Most of the above proposal implicitly assumes that we control and can configure every trusted proxy in the chain to perform sanitization, so that we can blindly trust a given origin address, but #711 makes the observation that that might not always be possible.

@adam-p Thanks for your excellent and comprehensive write-up: The perils of the “real” client IP. I have not read through the whole thing yet, but wanted to share an observation regarding the section you linked:

If your server is directly connected to the internet, there might be an XFF header or there might not be (depending on whether the client used a proxy). If there is an XFF header, pick the leftmost IP address that is a valid, non-private IPv4 or IPv6 address. If there is no XFF header, use the RemoteAddr.

On many corporate networks, it is common for users to have a private (RFC 1918) address for their workstation, with a NAT gateway to access other networks or provide outbound access to the Internet. In these environments, the correct client address is a private address, and using this logic would mean that rate limiting would be applied to a proxy, rather than the end user.

For example, suppose Alice (10.0.10.11) connects through an authenticating API gateway (10.10.10.10) to a reverse proxy (192.168.10.10) to the chi-based service (192.168.5.1). In this case, the leftmost non-private address is nothing, so we'd have no choice but to discard the X-Forwarded-For header entirely and use the original RemoteAddr, which would be 192.168.10.10, the address of the reverse proxy.

I'm also personally unsure whether a forward proxy would add additional complexity to this (e.g. Alice connects through a forward proxy to the API gateway.) In that case, would we have to handle a Via header, or would those proxies also add an X-Forwarded-For?

Making the trusted proxy count configurable (in addition to configuring trusted origins) seems like a great idea to me.

adam-p commented 2 years ago

On many corporate networks, it is common for users to have a private (RFC 1918) address for their workstation, with a NAT gateway to access other networks or provide outbound access to the Internet. In these environments, the correct client address is a private address, and using this logic would mean that rate limiting would be applied to a proxy, rather than the end user.

First, never use the leftmost-ish value for rate-limiting. But you're generally right -- my anti-private-IP-ism was also pointed out in a Reddit comment, so I added a footnote #4 saying something like, "if you're expecting internal IPs, never mind that bit". I guess I could elevate that.

I'm also personally unsure whether a forward proxy would add additional complexity to this (e.g. Alice connects through a forward proxy to the API gateway.) In that case, would we have to handle a Via header, or would those proxies also add an X-Forwarded-For?

For a rightmost-ish algorithm, this doesn't matter. But yeah, to get the leftmost-ish... I think it's impossible to know. You're back to guessing a) what benign proxies might have done along the way, and b) what may or may not be spoofed by the client or intermediate forward proxies.

I think you probably just have to arbitrarily decide. Like, "If there's an XFF header, use the leftmost; if not, use the Via if it's present". (I would check in that order. Because if you had two forward proxies adding Via headers, the second might clobber the header added by the first. Multiple Via headers are disallowed by RFC 2616.)

But, again, not for security-related uses.

In case this is helpful, I've started a hopefully-reference-implementation of some ways to get the "real" client IP from a request: https://github.com/realclientip/realclientip-go There's still work to do on it, though. Maybe I can finish it in the next week.

adam-p commented 2 years ago

I wrote a library to help with extracting the "real" client IP from a request. I intended that it be useful in cases like this, so please let me know if it's suitable (and if it's not we'll figure out how to make it better). https://github.com/realclientip/realclientip-go

go-chi / chi