golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.7k stars 17.62k forks source link

net/http: Allow obtaining original header capitalization #37834

Open JohnRusk opened 4 years ago

JohnRusk commented 4 years ago

What version of Go are you using (go version)?

1.13

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Windows, x64

What did you do?

Make a HTTP request, read it's headers, send those headers somewhere else.

What did you expect to see?

Headers are preserved exactly

What did you see instead?

Header capitialization gets canonicalized

I realize this has been discussed before (about 7 years ago).

However, in the time since then many users have been adversely affected by this. E.g. https://github.com/containous/traefik/issues/466 https://github.com/Azure/azure-storage-azcopy/issues/113

So I'd like to find out, would the Go team consider not a change to the current behaviour, but simply a way for an HTTPResponse to provide an additional map, that maps canonicalizedName -> originalName. If we could just get that map, then those of us who really need header case preservation could use the information it contains to achieve what we need. (Find out the original capitalization when we read a response, and then directly manipulate the outbound request map when we forward that data on).

I'd be happy to contribute the code for the above, if we expected it to be accepted.

toothrot commented 4 years ago

Hey @JohnRusk! I understand that one of your concerns is with Azure/azure-storage-azcopy#113, in which the API is expecting capitalized headers, despite RFC2616 specifying that header fields are to be treated as case-insensitive.

Issue #5022 was resolved by preserving case on outbound requests: https://github.com/golang/go/commit/f0396caf12abe7abb4ac6e29a743f0f6246f8f77

I don't think it's reasonable to keep a mapping of every header field transformation around for every request. As @bradfitz mentioned, http/2 explicitly lowercases all fields. I feel like making changes here for broken servers (which I understand are out of your control) doesn't seem reasonable.

Is it possible to keep a map of these special header fields around in your application and re-write them on your outbound request? I assume there is a subset that are sensitive, and it is not all HTTP headers. Their casing is preserved on outbound requests as I understand #5022.

/cc @bradfitz @rsc

JohnRusk commented 4 years ago

Is it possible to keep a map of these special header fields around in your application and re-write them on your outbound request?

Not really, because the set of header fields that exists is customer-determined, and may vary from customer to customer and even from file to file.

I don't think it's reasonable to keep a mapping of every header field transformation around for every request.

What about some kind of hook then, some kind of function that get's called as they are pulled of the wire, and before they are canonlicalized. If the hook is null, it doesn't get called.

e.g.(roughly)

headerName := ... <raw value off the wire>
if request.headerNameNotifier != nil {
   request.headerNameNotifier(headerName)
}
headerName := canonalicalize(headerName)

Then customers like us could provide request.headerNameNotifier and everyone else could leave it nil.

bradfitz commented 4 years ago

There are a few dup bugs in this space with a bunch of conversation. I can't find them at the moment (GitHub search is failing me) but they're somewhere.

JohnRusk commented 4 years ago

I had trouble searching too. I have managed to find a few though. Are any of these the ones you were thinking of?

ACECEO commented 4 years ago

Is this https://azure.microsoft.com/en-in/updates/azure-data-factory-supports-preserving-metadata-during-file-copy/ a work around?

See also https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview.

JohnRusk commented 4 years ago

@ACECEO yes, it's a workaround in the sense that it works. No in the sense that it's a totally different product, with a different emphasis and style of usage. For customers who use the tool I work on, AzCopy, this Go issue remains a blocker.

JohnRusk commented 4 years ago

@bradfitz Any update re thoughts on this? As noted above, the key issue is that we currently can't write case-preserving code that reads headers and forwards them on. In our tool, AzCopy, this is a big deal for certain customers who need to move their data around in Azure.

zezha-msft commented 4 years ago

Hi @ACECEO @toothrot, any additional thought?

cy33hc commented 4 years ago

RFC2616

I've read the RFC2616, it just mentions that the header is case-insensitive. Nothing says that you should modify the the headers as you forward them.

bradfitz commented 4 years ago

I feel like I've replied to this a number of places, but I'll summarize here as well.

As such, any fix here must have super minimal cost, both in terms of API surface (cogntitive load cost, reading more godoc) and runtime cost (we shouldn't allocate or populate a new data structure with this info unless the calling code opted in to wanting that). Notably, we can't change the representation of http.Header at this point.

It probably needs to be opt-in at the http.Server level. Perhaps a new bool there. That means "middleware" can't assume the original case is present, and can't ask for it to be present retroactively, as we'd only parse/preserve it if they opted in. So perhaps that's too inconvenient.

Maybe we just add a new RawHTTP1Header []byte field to http.Request, documenting that it's only for HTTP/1 and it's the exact bytes from the Request-Line to the final closing newlines. Then let code outside of the standard library deal with it. It'd also need to be documented that the slice is not a copy and it aliases some internal buffer and is only valid until the next request on the connection. That kinda sucks as an API requirement, but it's not too dissimilar to e.g. https://golang.org/pkg/bufio/#Reader.ReadSlice. And it sucks less than making all users pay for an allocation that very little code will ever use.

If somebody has an alternate API proposal or implementation I'll take a look, assuming it meets these general requirements.

JohnRusk commented 4 years ago

Hi Brad. That suggestion about RawHTTP1Header sounds like a good option. Minimal impact, but enough to let us do what we need to. I presume the internal buffer, which it aliases, already exists. Is that right? If so, what about exposing it as method instead of a field? Perhaps called something like GetRawHTTP1Header() or CopyRawHttp1Header()? The method would make a copy from the internal buffer. But the copy would only happen if the method was called. So the only users who would pay the price of the allocation are those who actually need and use this feature. Perhaps that would be cleaner, in terms of API design, because the result of the method call is indeed a copy, and so there's no need to worry about the duration of its validity. (You just need to call the method before the next request is made on the connection).

Will it work client side too? (Just checking because you mention http.Server in the earlier paragraph, but we need it client-side, and I imagine other users will too.)

bradfitz commented 4 years ago

Perhaps called something like GetRawHTTP1Header() or CopyRawHttp1Header()? The method would make a copy from the internal buffer.

My reservation with a method is that they render pretty prominently in godoc HTML compared to, say, a struct field.

Perhaps that would be cleaner, in terms of API design, because the result of the method call is indeed a copy, and so there's no need to worry about the duration of its validity.

It would be cleaner, but OTOH this is definitely in the realm of "you should know what you're doing" territory so making a super comfortable & safe API isn't the highest priority.

BTW, the other constraint that would need to be documented on the lifetime of this slice is that it's only valid before the Request.Body is read from. Because that can also advance the buffer.

But now that I think about it more, I'm not even sure that this API works, as I think we only use a 2KB or 4KB bufio.Reader to read from the connection (so that's all we have in memory at a time) and we permit by default up to 1MB of headers (https://golang.org/pkg/net/http/#DefaultMaxHeaderBytes). So RawHTTP1Header []byte on by default won't work, at least in the general case. (And documenting that it only works for "small" HTTP requests seems pretty terrible.)

So we probably do need some sort of opt-in mechanism, and then it can't work in HTTP middleware packages that don't have control over the http.Server setup.

Will it work client side too?

Yeah, it probably should. I don't think we'd want to accept a change that only did one, lest it turn out that whatever API we pick isn't sufficient for the other. Seeing it work with both would be a good sign that the API was sufficient.

JohnRusk commented 4 years ago

Good point re whether it should be documented prominently or not. (I agree with you that "not prominent" is preferable).

The question of whether it only works on small requests seems key. I think folks who have meaningful info in headers (such as some of our users) could easily have more than 4KB. Do you have any other suggestions? Shall we (our team here) see if we can come up with any alternative that meets the general requirements that you've outlined?

gufertum commented 2 years ago

As I understand this problem, this prevents many users to just use traefik as a plugin-proxy, because it ALTERS the headers. There is a ton of software which is using case-sensitive headers and a transparent proxy should not change them...

denandz commented 4 months ago

For additional context - There are currently multiple intercepting HTTP proxies which are affected by this issue, including goproxy and martian, plus all the various tools built on these libraries.

When building intercepting proxies for security testing or logging, net/http ends up modifying header data as discussed in this issue and others. I understand that the clients/servers which require specific header canonicalization are not RFC compliant; however, their compliance aside these clients cannot be tested by net/http based tooling.

For example, attempting to use glorp proxy (based on Martian) to test a mobile application recently lead to issues as the client required specific capitalisation. Fixing the third-party mobile application to become RFC compliant was not feasible - and instead the testing had to be completed using a non-Go based intercept proxy.

denandz commented 2 months ago

For my use-case I was able to work around this issue with some semi-sketchy binary patching. I don't suggest anyone use this approach, at least not without significant testing, but it worked to sort out my specific issue: https://pulsesecurity.co.nz/articles/golang-patching

klarose commented 1 month ago

Brad,

Regarding HTTP/1:

I feel like I've replied to this a number of places, but I'll summarize here as well.

  • HTTP/2 makes this whole issue moot. So this bug only matters for HTTP/1<->HTTP/1 interactions, which becomes less relevant with each passing month.
  • Code that's sensitive to HTTP/1 header case is already broken; catering towards such code implicitly encourages such brokenness or at least allows it to continue, without providing pressure on those applications to get fixed

As such, any fix here must have super minimal cost, both in terms of API surface (cogntitive load cost, reading more godoc) and runtime cost (we shouldn't allocate or populate a new data structure with this info unless the calling code opted in to wanting that). Notably, we can't change the representation of http.Header at this point.

Four years since you posted this reply, browsers still use HTTP/1.1 for websockets. Given the slow trickle of issues referencing this one, HTTP/1 vs HTTP/2 is starting to smell a bit like IPv4 vs IPv6. While we'd prefer to focus on the relatively modern version of a protocol, there are still compelling reasons to continue supporting the old one.

I don't disagree with your point that this should be something low-touch and low-impact, particularly when not opting in. What would it take to get something actually moving towards a solution here? Those of us who have hit this issue are somewhat stuck.

I can think of a whole bunch of options to solve this, with various trade-offs. For example, for my purposes I would be fine if I could look up the original case of the every header key so that I can decide what to send when proxying a request by just mapping the keys to the old case post-processing. I can probably live with the corner case of there being repeated headers pre-normalisation. But, I imagine that wouldn't suffice for other use-cases.

For what it's worth, I ran into this because a websocket server we're proxying to is doing a case-sensitive lookup of Sec-WebSocket-Key, which ultimately breaks the websocket handshake. We do not have any control over the server, so we can't influence it to do the right thing. We're left with figuring out a way to maintain the case sent by the browser, but http.Server stands in our way.

We could certainly just patch up this particular key, but I'm sure we'll find other instances of this problem, so a general solution feels appropriate.