floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
6.63k stars 472 forks source link

Compressed data & chunk size fails fetch #895

Open voidware opened 10 months ago

voidware commented 10 months ago

I'm having a problem with emscripten sokol_fetch and compressed data with chunk size;

Sokol issues a HEAD and gets the compressed content length.

HTTP/1.1 200 OK
Date: Mon, 18 Sep 2023 12:41:06 GMT
Connection: Keep-Alive
ETag: "1695039706"
Cache-Control: max-age=86400
Content-Encoding: gzip
Content-Length: 18042
Content-Type: application/json
Last-Modified: Mon, 18 Sep 2023 12:21:46 GMT
Accept-Ranges: bytes
Vary: Origin

Sokol issues a get range and gets uncompressed data;

HTTP/1.1 206 Partial Content
Date: Mon, 18 Sep 2023 12:58:19 GMT
Connection: Keep-Alive
ETag: "1695039706"
Cache-Control: max-age=86400
Content-Length: 1024
Content-Range: bytes 0-1023/52494
Content-Type: application/json
Last-Modified: Mon, 18 Sep 2023 12:21:46 GMT
Accept-Ranges: bytes
Vary: Origin

And the server does not compress it (no Content-Encoding field). The range requested is interpreted as that of uncompressed data.

So here we get the first 1K of 52K.

But Sokol stops fetching after 18042 of uncompressed data and the download is incomplete.

I don't know if this is a server problem or a Sokol problem. But it would seem the server has the option always to send the data uncompressed anyway and this is what it is doing.

Also, would it ever be the case that ranges are compressed? For example, does the server have the option to compress each range separately and therefore have completely different Content-Length both to the request and to any HEAD request?

And if a range within a file were requested how would it ever be possible for to receive uncompressed data in the buffer? So i dont think the fetch buffer needs to be bigger than the chunk size ever. Except for chunk_size=0.

floooh commented 10 months ago

Hmm, I'm somewhat sure that I had received compressed chunks when experimenting with streaming downloads, otherwise I wouldn't have gone to great length describing that scenario here:

https://github.com/floooh/sokol/blob/751fc4c14a0cb80130ad8014f965ac62c7e89d34/sokol_fetch.h#L600-L638

If the server answers that the data will be sent compressed with a HEAD request, but then doesn't send compressed chunks, then currently sokol_fetch.h indeed cannot know when the download has finished.

The streaming sample here doesn't seem to use compression (e.g. the HEAD request returns with the actual uncompressed data size, probably because compression is deactivated for MPEG files):

https://floooh.github.io/sokol-html5/plmpeg-sapp.html

If it's only about detecting when the streamed download is complete, then I can probably look at the Content-Range response header:

Content-Range: bytes 0-1023/52494

...since the part after the slash is the overall size, so it's possible to just look at the chunk's Content-Range header to check for completion.

That sounds like a plan. I need to look into sokol_fetch.h again soonish anyway because of https://github.com/floooh/sokol/issues/882.

voidware commented 10 months ago

Thanks for looking at this.

It appears, when a HEAD is issued, the Content-Length will reflect whether compression is acceptable, since i think the value from HEAD is meant to be the same as the value from GET, all things being consistent.

So

curl -I <url> -H "Accept-Encoding: gzip"

Will contain:

Content-Encoding: gzip
Content-Length: 18042

curl -I <url>

Will contain

Content-Length: 52494

In such cases the Content-Length will then be consistent with a subsequent GET in the non-range case.

For ranges, I'm thinking the server can opt out of compression. I think identity is always implied. I tried to stop it with:

curl <url> -i -H "Accept-Encoding: gzip,identity;q=0" -H "Range: bytes=0-1023"

But it still returned uncompressed data.

voidware commented 10 months ago

BTW, if you're going to be looking at fetch sometime, can you have a quick look at the case where a buffer is not pre-assigned. I tried the method of allocating the buffer in dispatch, but my callback never happened. I could only get the pre-allocated buffer method to work.

BTW2, For the short time i have a workaround for the range problem. It turns out i only need chunks for streaming media, which is already compressed. Fetching small text files does not need chunks as they always fit in my buffer anyhow. For now i just set chunk_size to zero for those files.

BTW3, it would be nice to know whether ranges can indeed be compressed and whether the server can opt to compress each range separately. I read somewhere some CDNs do this. I have had a look around and can't find anything definite in this area. Seems to be a bit of a hole in the specifications.

Thanks.

floooh commented 10 months ago

where a buffer is not pre-assigned...

...hmm, the cgltf-sapp.c sample works like that. The sfetch_send() calls don't assign a buffer:

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L284-L287

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L578-L582

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L655-L659

...and the buffer is assigned inside the response-callback when the response is in dispatched state, using the channel and lane-indices to select a buffer:

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L447-L450

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L467-L469

https://github.com/floooh/sokol-samples/blob/32d1a2e27592a486bcd248aba02f1943f039d443/sapp/cgltf-sapp.c#L487-L489

...are you using it differently? (if yes the documentation probably needs to be improved)

voidware commented 10 months ago

Thanks for checking this. I tried it again. Yes, the problem is only when you have a nonzero chunk_size. it blows an assert complaining the buffer is too small for the chunk, because there is no buffer yet!

voidware commented 10 months ago

Also am i right in thinking assinging the buffer in dispatch will cause an additional frame delay? If so, I'll probably preassign the buffer anyhow.

floooh commented 10 months ago

Also am i right in thinking assinging the buffer in dispatch will cause an additional frame delay?

It actually shouldn't because the dispatch callback is 'short-circuited' as soon as a lane is assigned to the request and before it is enqueued for processing, there's no extra roundtrip involved (the channel and lane index lets you pick a buffer which will only be written to by this specific request, because it's guaranteed that no other request is in flight with the same channel/lane combination):

https://github.com/floooh/sokol/blob/b803c9a0214c6ab6dcb9cc6dd9d30d7ace4eda1e/sokol_fetch.h#L2485-L2491