Open bvizzier-ucsc opened 11 months ago
Assignee to consider next steps.
I do see Azul is not setting a Content Disposition hearder like Content-Disposition: attachment; filename="filename.jpg" in the response, I also see that the Content-Type content type does not reflect the gz status so this may encourage the browser to drop the .gz extension. Or so says the internet.
The file is served not by Azul, but by Google Cloud Storage.
The screenshot shows that there are two requests at play here. The first request is to Azul. While serving that request, Azul acts as a client to TDR's DRS implementation in order to obtain a signed URL to the file. Azul then returns that signed URL verbatim to DB. DB then makes a second request, a request to that signed URL. The request goes to Google, not Azul. The signed URL points to a file in a GCS bucket owned and controlled by TDR.
Here are the response headers for the second request:
Because Google responds without a content-disposition header, without a content-encoding header and with a content-type header that falsely declares the file as CSV while the response body is actually still gzip-encoded, the user ends up with a gzip-compressed file, but without the .gz
extension in the name.
By convention, a file compressed with gzip should have the .gz
extension. Alternatively, the file could be decompressed on the fly during the download and stored without the .gz
extension in the name. I've implemented both solutions in the past.
It is true that certain valid combinations of the content-type, content-disposition and content-encoding response headers might solve this in a way consistent with the two common scenarios mentioned above (EITHER compressed with .gz
extension OR uncompressed without that extension). However, Azul has no way of affecting what headers Google returns when DB makes a request to Google. TDR may be able to bake certain headers into the signed URL but, again, Azul just returns the signed that it receives from TDR.
I've raised this before with the Broad but got nowhere: https://github.com/DataBiosphere/azul/issues/4838
I'm afraid there is nothing the Azul team can do.
Assignee to try to raise this again with the Broad.
Still under investigation on the Broad side.
Slack thread
Description of the problem as reported by the user:
Dave Rogers investigated the problem and reported: