Open paralin opened 5 months ago
I believe the comment preceding the line sufficiently explains why it isn't set.
@seankhliao It does not sufficiently cover it. The comment even says within that this is a hacky fix.
I am raising a situation where this needs to be set and there should be some kind of way to force that code branch to not be taken.
CC @neild who introduced this check & the comment: https://github.com/golang/go/commit/fdc21f3eafe94490e55e0bf018490b3aa9ba2383
I don't find the comment above this if
to be clear.
@paralin your code is incorrect, Range
gives offsets in the plaintext, not compressed archive, however because you give the already compressed archive to net/http
's ranging code, it will interpret them as offsets into the archive and answer with invalid truncated brotli streams no valid client would know to do anything with.
Your best bet is to not use http.FS
and use a custom handler instead,
short as an if
check on Accept-Encoding
, then grab the compressed archive, set Content-Length
to the compressed size, write status and finally io.Copy
.
If you also want to support Range
, then you can use the same thing as above, but also send Accept-Ranges: bytes
and then add one more if
which instead call http.ServeFile
with the plaintext if Range
header is set.
@Jorropo This is not correct. The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.
When Content-Encoding
is set, the Content-Length
refers to the size of the encoded content. So, Content-Length: 100
, Content-Encoding: gzip
refers to 100 bytes after compression.
(In contrast, Content-Length: 100
, Transfer-Encoding: gzip
refers to 100 bytes before compression.)
So what @paralin is doing here is correct.
Unfortunately, as the comment in fs.go
says, it is the unfortunate reality that there is code out there in the wild that:
serveContent
a Content-Length
referring to the pre-compressed content size; andCompresses the response in the ResponseWriter
.
This is wrong, and breaks Range
requests, but some amount of code does it and it will break for all requests if we set Content-Length
unconditionally here. It's unfortunate.
As the comment in fs.go
says, we could possibly check the type of the ResponseWriter
and set Content-Length
unconditionally if we recognize it as one that we control.
Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).
At the moment I'm working around this by opening the file and getting the size in advance before calling the http ServeHTTP. But this is opening the file twice and seems inherently inefficient / hacky and also bypasses the other logic in the fs http handler
The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.
Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding
is meant to be transparent, so you get offsets into the plain text:
I modified your code to expose a localhost HTTP server: https://go.dev/play/p/UB1en6ZlrRB
> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Type: application/octet-stream
Date: Mon, 15 Apr 2024 23:27:35 GMT
Content-Length: 16
hello world
Accept-Ranges: bytes
tell me I can range the file, so let me try:
> curl --compressed -H "Range: bytes=1-" http://localhost:55555/hello-world.br
curl: (61) Unrecognized or bad HTTP Content or Transfer-Encoding
And it fails with error code 61
because it received a corrupted brotli stream.
Original Response: 8f05 8068 656c 6c6f 2077 6f72 6c64 0a03 ...hello world..
What your code send with Range: 05 8068 656c 6c6f 2077 6f72 6c64 0a03 ..hello world..
Here it is fixed: 0f05 8065 6c6c 6f20 776f 726c 640a 03 ...ello world..
The difference is that your code truncates the brotli header, while Content-Encoding
must be transparent, so it need to answer with a complete brotli stream of the partial plaintext.
Here I modified the code to skip http.FileServer
if the request is simple to serve: https://go.dev/play/p/_OR1Gx3rg_l
> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Length: 16
Date: Tue, 16 Apr 2024 00:03:20 GMT
hello world
> curl --compressed -i -H "Range: bytes=1-" http://localhost:55555/hello-world.br
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 11
Content-Range: bytes 1-11/12
Content-Type: text/plain; charset=utf-8
Date: Tue, 16 Apr 2024 00:06:08 GMT
ello world
This also solve your original issue and sets Content-Length
.
I don't see how:
As the comment in
fs.go
says, we could possibly check the type of theResponseWriter
and setContent-Length
unconditionally if we recognize it as one that we control.
Would help here, the ranging issue can't be fixed inside net/http
since nothing tell us what the plaintext is supposed to be.
I agree this workaround could help, but in other conditions.
Change https://go.dev/cl/579095 mentions this issue: net/http: correct workaround comment for Content-Encoding in serveContent
In the case when requesting a Range and have set Content-Encoding I agree that this would not work because the browser will try to decompress the result and get invalid compressed data.
My particular case is serving the entire .br file with content encoding set (not using Range). To be clear my specific use case is serving a .wasm file brotli compressed.
@Jorropo I got a bit confused with the specifics of this issue when you mentioned the Range requests because I'm using Range requests to access an encoded file, but without Content-Encoding set, decompressing on the client side, in a different context. With respect to this issue it was about setting the Content-Length and Content-Encoding when requesting the entire file.
Mb, I forgot implied the last step of my reasoning:
serveContent
it is because it parse and handle ranges for me.Conclusion: why even bother with serveContent
? Handle .Open
, io.Copy
and setting Content-Length
yourself instead (like I showed in my example).
IMO the fact it sends Accept-Ranges: bytes
and incorrectly respond when Range
is set, shows that you are using serveContent
function differently than how it was designed to be used. Then I don't find surprising that a miss-used function has issues.
The problem is that Content-Encoding nor content-type are set properly for wasm and wasm.br. I am otherwise using the filesystem as designed. Just for this file type it's a special case.
Content-Encoding
and Content-Type
are working as designed, ServeContent
cant set br
and application/wasm
for you without breaking many other applications, that gonna take back and forth to explain, do you want to come to the gopher slack or gopher discord you can ping @Jorropo
?
Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding is meant to be transparent, so you get offsets into the plain text:
I'm sorry, but this is not accurate.
Unlike Transfer-Encoding (Section 6.1 of [HTTP/1.1]), the codings listed in Content-Encoding are a characteristic of the representation; the representation is defined in terms of the coded form, and all other metadata about the representation is about the coded form unless otherwise noted in the metadata definition. -- https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4-7
The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data (Section 8.1), rather than the entire selected representation. -- https://www.rfc-editor.org/rfc/rfc9110.html#section-14.2-1
The Range
and Content-Range
headers refer to ranges of the "representation", and the representation is the coded form. When Content-Encoding: br
, ranges are defined in terms of the compressed bytes, not the uncompressed bytes.
This is in contrast to Transfer-Encoding
.
Your curl
example (curl --compressed -H "Range: bytes=1-"
) does encounter a decompression failure, but that's just demonstrating that it's possible to construct a curl
command that fails.
Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).
The best I can come up with is to have an optional ResponseWriter
method that lets the FileServer
know that the ResponseWriter
isn't going to misbehave. SetContentLength
, maybe, where implementing it implies that the ResponseWriter
understands that it isn't allowed to mess around with the data?
New magic ResponseWriter
methods are generally exposed via ResponseController
, but doing so in this case might be a problem: ResponseController
provides a way for middleware to wrap an underlying ResponseWriter
without losing its methods, but if a misbehaving ResponseWriter
implements that unwrapping it would subvert the check intended to verify that it doesn't misbehave. Bit of a mess there, not sure how much of a concern it would be.
Or I suppose we could go the other way, and have a magic method on the Handler
returned by FileServer
that lets you tell it that the ResponseWriter
isn't broken. That's also kind of terrible.
Or add a new FileServerFSOptions
function that takes a struct of options to configure the file server with. Maybe there are other options we'd like to put in there?
I don't like any of the options I can think of.
It's not pretty, but passing the options at construct time as a new optional parameter with perhaps a NewFileServerWithOptions or so, probably is the least hacky way. Then there are probably other flags that could be set too. Like disabling the index page, adding additional file extension to content type mappings, setting Content-Encoding, etc.
I'm sorry, but this is not accurate.
Whoa mb thx. I guess that useful to resume compressed downloads but I tried multiple webservers and they don't handle Range when sending back compressed responses.
Some historical context: https://github.com/golang/go/issues/19420
I made: https://github.com/golang/go/pull/50904
@neild: I do not get really the issue here now. I think we got this fixed? Maybe the fix is simply to document that if you are setting Content-Encoding
you should also set Content-Length
? Because this is what really is the problem here. Setting only Content-Encoding
is not enough. But if you are setting Content-Encoding
then it is trivial to also set Content-Length
at the same place (and if it is not trivial - because you do not have easy way to determine length, then FileServer cannot set it either for you).
Go version
go version go1.22.2 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
I am trying to set Content-Encoding for .br brotli compressed files with http.FileServer.
This code then skips setting Content-Length because Content-Encoding is set: https://github.com/golang/go/blob/9f13665088012298146c573bc2a7255b1caf2750/src/net/http/fs.go#L377
This causes my httptest to fail with an empty Content-Length, as well as other cases where I call ServeHTTP without using the full http client: https://go.dev/play/p/2UvgBSH8FWx
When using http.Get it works correctly, so the client must be filling in the Content-Length: https://go.dev/play/p/xp08V07UrtQ
What did you see happen?
The content-length header is not set when the content-encoding header is set.
What did you expect to see?
I expect the content-length header to be set even if the content-encoding header is set.