golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.2k stars 17.57k forks source link

net/http: FileServer: Content-Length is not set when Content-Encoding was set #66735

Open paralin opened 5 months ago

paralin commented 5 months ago

Go version

go version go1.22.2 darwin/arm64

Output of go env in your module/workspace:

N/A

What did you do?

I am trying to set Content-Encoding for .br brotli compressed files with http.FileServer.

This code then skips setting Content-Length because Content-Encoding is set: https://github.com/golang/go/blob/9f13665088012298146c573bc2a7255b1caf2750/src/net/http/fs.go#L377

This causes my httptest to fail with an empty Content-Length, as well as other cases where I call ServeHTTP without using the full http client: https://go.dev/play/p/2UvgBSH8FWx

When using http.Get it works correctly, so the client must be filling in the Content-Length: https://go.dev/play/p/xp08V07UrtQ

What did you see happen?

The content-length header is not set when the content-encoding header is set.

What did you expect to see?

I expect the content-length header to be set even if the content-encoding header is set.

seankhliao commented 5 months ago

I believe the comment preceding the line sufficiently explains why it isn't set.

paralin commented 5 months ago

@seankhliao It does not sufficiently cover it. The comment even says within that this is a hacky fix.

I am raising a situation where this needs to be set and there should be some kind of way to force that code branch to not be taken.

paralin commented 5 months ago

CC @neild who introduced this check & the comment: https://github.com/golang/go/commit/fdc21f3eafe94490e55e0bf018490b3aa9ba2383

Jorropo commented 5 months ago

I don't find the comment above this if to be clear.


@paralin your code is incorrect, Range gives offsets in the plaintext, not compressed archive, however because you give the already compressed archive to net/http's ranging code, it will interpret them as offsets into the archive and answer with invalid truncated brotli streams no valid client would know to do anything with.

Your best bet is to not use http.FS and use a custom handler instead, short as an if check on Accept-Encoding, then grab the compressed archive, set Content-Length to the compressed size, write status and finally io.Copy. If you also want to support Range, then you can use the same thing as above, but also send Accept-Ranges: bytes and then add one more if which instead call http.ServeFile with the plaintext if Range header is set.

paralin commented 5 months ago

@Jorropo This is not correct. The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.

neild commented 5 months ago

When Content-Encoding is set, the Content-Length refers to the size of the encoded content. So, Content-Length: 100, Content-Encoding: gzip refers to 100 bytes after compression.

(In contrast, Content-Length: 100, Transfer-Encoding: gzip refers to 100 bytes before compression.)

So what @paralin is doing here is correct.

Unfortunately, as the comment in fs.go says, it is the unfortunate reality that there is code out there in the wild that:

As the comment in fs.go says, we could possibly check the type of the ResponseWriter and set Content-Length unconditionally if we recognize it as one that we control.

paralin commented 5 months ago

Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).

At the moment I'm working around this by opening the file and getting the size in advance before calling the http ServeHTTP. But this is opening the file twice and seems inherently inefficient / hacky and also bypasses the other logic in the fs http handler

Jorropo commented 5 months ago

The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.

Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding is meant to be transparent, so you get offsets into the plain text: I modified your code to expose a localhost HTTP server: https://go.dev/play/p/UB1en6ZlrRB

> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Type: application/octet-stream
Date: Mon, 15 Apr 2024 23:27:35 GMT
Content-Length: 16

hello world

Accept-Ranges: bytes tell me I can range the file, so let me try:

> curl --compressed -H "Range: bytes=1-" http://localhost:55555/hello-world.br
curl: (61) Unrecognized or bad HTTP Content or Transfer-Encoding

And it fails with error code 61 because it received a corrupted brotli stream.

Original Response:              8f05 8068 656c 6c6f 2077 6f72 6c64 0a03  ...hello world..
What your code send with Range:   05 8068 656c 6c6f 2077 6f72 6c64 0a03  ..hello world..
Here it is fixed:               0f05 8065 6c6c 6f20 776f 726c 640a 03    ...ello world..

The difference is that your code truncates the brotli header, while Content-Encoding must be transparent, so it need to answer with a complete brotli stream of the partial plaintext.

Here I modified the code to skip http.FileServer if the request is simple to serve: https://go.dev/play/p/_OR1Gx3rg_l

> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Length: 16
Date: Tue, 16 Apr 2024 00:03:20 GMT

hello world
> curl --compressed -i -H "Range: bytes=1-" http://localhost:55555/hello-world.br
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 11
Content-Range: bytes 1-11/12
Content-Type: text/plain; charset=utf-8
Date: Tue, 16 Apr 2024 00:06:08 GMT

ello world

This also solve your original issue and sets Content-Length.

Jorropo commented 5 months ago

I don't see how:

As the comment in fs.go says, we could possibly check the type of the ResponseWriter and set Content-Length unconditionally if we recognize it as one that we control.

Would help here, the ranging issue can't be fixed inside net/http since nothing tell us what the plaintext is supposed to be. I agree this workaround could help, but in other conditions.

gopherbot commented 5 months ago

Change https://go.dev/cl/579095 mentions this issue: net/http: correct workaround comment for Content-Encoding in serveContent

paralin commented 5 months ago

In the case when requesting a Range and have set Content-Encoding I agree that this would not work because the browser will try to decompress the result and get invalid compressed data.

My particular case is serving the entire .br file with content encoding set (not using Range). To be clear my specific use case is serving a .wasm file brotli compressed.

@Jorropo I got a bit confused with the specifics of this issue when you mentioned the Range requests because I'm using Range requests to access an encoded file, but without Content-Encoding set, decompressing on the client side, in a different context. With respect to this issue it was about setting the Content-Length and Content-Encoding when requesting the entire file.

Jorropo commented 5 months ago

Mb, I forgot implied the last step of my reasoning:

Conclusion: why even bother with serveContent ? Handle .Open, io.Copy and setting Content-Length yourself instead (like I showed in my example).

IMO the fact it sends Accept-Ranges: bytes and incorrectly respond when Range is set, shows that you are using serveContent function differently than how it was designed to be used. Then I don't find surprising that a miss-used function has issues.

paralin commented 5 months ago

The problem is that Content-Encoding nor content-type are set properly for wasm and wasm.br. I am otherwise using the filesystem as designed. Just for this file type it's a special case.

Jorropo commented 5 months ago

Content-Encoding and Content-Type are working as designed, ServeContent cant set br and application/wasm for you without breaking many other applications, that gonna take back and forth to explain, do you want to come to the gopher slack or gopher discord you can ping @Jorropo ?

neild commented 5 months ago

Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding is meant to be transparent, so you get offsets into the plain text:

I'm sorry, but this is not accurate.

Unlike Transfer-Encoding (Section 6.1 of [HTTP/1.1]), the codings listed in Content-Encoding are a characteristic of the representation; the representation is defined in terms of the coded form, and all other metadata about the representation is about the coded form unless otherwise noted in the metadata definition. -- https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4-7

The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data (Section 8.1), rather than the entire selected representation. -- https://www.rfc-editor.org/rfc/rfc9110.html#section-14.2-1

The Range and Content-Range headers refer to ranges of the "representation", and the representation is the coded form. When Content-Encoding: br, ranges are defined in terms of the compressed bytes, not the uncompressed bytes.

This is in contrast to Transfer-Encoding.

Your curl example (curl --compressed -H "Range: bytes=1-") does encounter a decompression failure, but that's just demonstrating that it's possible to construct a curl command that fails.

neild commented 5 months ago

Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).

The best I can come up with is to have an optional ResponseWriter method that lets the FileServer know that the ResponseWriter isn't going to misbehave. SetContentLength, maybe, where implementing it implies that the ResponseWriter understands that it isn't allowed to mess around with the data?

New magic ResponseWriter methods are generally exposed via ResponseController, but doing so in this case might be a problem: ResponseController provides a way for middleware to wrap an underlying ResponseWriter without losing its methods, but if a misbehaving ResponseWriter implements that unwrapping it would subvert the check intended to verify that it doesn't misbehave. Bit of a mess there, not sure how much of a concern it would be.

Or I suppose we could go the other way, and have a magic method on the Handler returned by FileServer that lets you tell it that the ResponseWriter isn't broken. That's also kind of terrible.

Or add a new FileServerFSOptions function that takes a struct of options to configure the file server with. Maybe there are other options we'd like to put in there?

I don't like any of the options I can think of.

paralin commented 5 months ago

It's not pretty, but passing the options at construct time as a new optional parameter with perhaps a NewFileServerWithOptions or so, probably is the least hacky way. Then there are probably other flags that could be set too. Like disabling the index page, adding additional file extension to content type mappings, setting Content-Encoding, etc.

Jorropo commented 5 months ago

I'm sorry, but this is not accurate.

Whoa mb thx. I guess that useful to resume compressed downloads but I tried multiple webservers and they don't handle Range when sending back compressed responses.

mitar commented 3 weeks ago

Some historical context: https://github.com/golang/go/issues/19420

I made: https://github.com/golang/go/pull/50904

@neild: I do not get really the issue here now. I think we got this fixed? Maybe the fix is simply to document that if you are setting Content-Encoding you should also set Content-Length? Because this is what really is the problem here. Setting only Content-Encoding is not enough. But if you are setting Content-Encoding then it is trivial to also set Content-Length at the same place (and if it is not trivial - because you do not have easy way to determine length, then FileServer cannot set it either for you).