Closed konradpabjan closed 4 years ago
I'm not sure that we always want to decode gzip responses automatically for the user. As an action author, I would expect the HTTP client to give me the raw data so that I can use it however I want.
I guess there are two use cases here:
accept-encoding: gzip
, and the server decided to apply HTTP compression during transfer. In case 2, the HTTP client should handle decompression automatically, do you agree?
@konradpabjan, when we get the stream, is it a gzip of a gzip? Did we upload gzip? Did the server apply compression again during transfer? If we are use case 1, what is the header value of accept-encoding
?
@konradpabjan, when we get the stream, is it a gzip of a gzip? Did we upload gzip? Did the server apply compression again during transfer? If we are use case 1, what is the header value of
accept-encoding
?
There is no gzip of a gzip. When files are uploaded using upload-artifact
, they are first gziped locally, then the Content-Encoding
header gets set to gzip
and that information gets saved somewhere in the DB for the individual uploaded file.
When making the GET
request for an individual file during download, there is a an Accept-Encoding
header that you can set to gzip
(I set for all the requests, if you don't include this, then the Content-Encoding
header is empty). When a response is received, the Content-Encoding
header is either set to gzip
or it is empty depending on how the file was originally uploaded.
During artifact transfer, there is no extra gzip compression applied by the server, the only time the server does anything with gzip is when the zip file is getting creating when it is being downloaded from the UI (previously gzipped files get ungzipped at this point).
I'm not sure that we always want to decode gzip responses automatically for the user. As an action author, I would expect the HTTP client to give me the raw data so that I can use it however I want.
I think 99% of the time, if someone is getting a response, and the content is in gzip, they're going to want to un-gzip that content immediately so they can start working with it (that's what the microsoft/typed-rest-client
is doing). In principle, i guess it would be nice to have the raw gzip data but why? Valid use cases seem very limited and almost everyone would immediately un-gzip the raw data. Gzip is mostly used to reduce to the amount of data being transferred over so it's effectively like a low level implementation detail that our users won't need so I'm leaning towards always decoding gzip responses. Looking at issues in the microsoft/types-rest-client
there have been no complaints or requests for raw gzip data since the feature was added.
So this is our flow:
fileA.txt
Upload-Artifacts
compresses fileA.txt
into a gzip stream, and send it up with Content-Encoding: gzip
Download-Artifacts
request the fileA.txt
with accept-encoding: gzip
fileA.txt
, figures out it's already gzip-compressed, so just send the stream back with Content-Encoding: gzip
and does not reapply compressionContent-Encoding
header, and decompress the stream back into fileA.txt
Download-Artifacts
saves the response to fileSo this flow is basically applying HTTP compression - and in this case we want to hide this layer from the user.
If user wants to download a raw gzip file, they will have to upload a gzip with Content-Encoding: identity
? And should either request the file without accept-encoding: gzip
or let the server apply compression again?
This sounds right.
Well @hashtagchris is a genius, and it is possible to successfully un-gzip a downloaded file without making changes here.
Since this is no longer needed, I'm closing this PR 🎉
Overview
Largely porting over some changes from the
microsoft/type-rest-client
that by default supports handling encoded responses: https://github.com/microsoft/typed-rest-client/pull/189This is needed for the
v2
version ofdownload-aritfact
that is currently in preview and has some problems that are related to gzip. https://github.com/actions/download-artifact/issues/23When currently calling
readBody()
the response body gets converted to a string with the default javascript encoding which isutf-16
. When trying to ungzip this string, it is no longer possible to do because some information gets lost during the initial string conversion in this package. I've tried a host of different things including piping the resulting string to a file and attempting to ungzip it, but there is always thisincorrect header check
which indicates the data is in an incorrect format:The really simple program that I've been using for testing.
I've also tried a host of different things using the node
Buffer
class to go between encodings, but it seems like there is some permanent information lost that cannot be retried afterreadBody()
gets called: https://nodejs.org/docs/latest/api/buffer.html#buffer_buffers_and_character_encodings. I'm able to ungzip the body using the raw fiddler response when an artifact download is happening, so the issue is definitely in this package.Testing
Added a test to make sure the content correctly gets decoded
Successful end-to-end artifact download/upload with the changes: https://github.com/konradpabjan/artifact-test/runs/551807430?check_suite_focus=true
Downloading the artifacts from the UI shows the expected size, and inspecting the files reveals no gzip content which is expected.
After running
npm build
with these changes, i copied over the output to thenode_modules
directory in the forked@actions/download-artifact
and@actions/upload-artifact
directory that I've been testing with.