axel-download-accelerator / axel

Lightweight CLI download accelerator
GNU General Public License v2.0
3.01k stars 266 forks source link

Parsing the `content-disposition` header with additional `filename*` parameter #429

Closed starrify closed 8 months ago

starrify commented 8 months ago

Overview

As per RFC 6266, a content-disposition header may include a filename* parameter for suggesting a file name with extended (non-ISO-8859-1) encoding.

axel doesn't seem to handle that properly which might be considered an issue.

Examples of this format

Here is one example given in RFC 6266:

     Content-Disposition: attachment;
                          filename="EURO rates";
                          filename*=utf-8''%e2%82%ac%20rates

Also here's an example from a real-world web server: (the link points to an ISO image of Windows 11 Enterprise Edition)

$ URL="https://software-static.download.prss.microsoft.com/dbazure/888969d5-f34g-4e03-ac9d-1f9786c66749/22631.2428.231001-0608.23H2_NI_RELEASE_SVC_REFRESH_CLIENTENTERPRISEEVAL_OEMRET_x64FRE_en-us.iso"
$ curl -I -s "$URL" | grep -i '^content-disposition'
content-disposition: attachment; filename=22631.2428.231001-0608.23H2_NI_RELEASE_SVC_REFRESH_CLIENTENTERPRISEEVAL_OEMRET_x64FRE_en-us.iso; filename*=UTF-8''22631.2428.231001-0608.23H2_NI_RELEASE_SVC_REFRESH_CLIENTENTERPRISEEVAL_OEMRET_x64FRE_en-us.iso

Behavior of axel's, and of some other user agents

Here creates a dummy HTTP server for testing:

$ cat > tmp_payload <<- EOF
HTTP/1.1 200 OK
Content-Disposition: attachment; filename=foo; filename*=UTF-8''bar

foobar
EOF
$ socat -v TCP-LISTEN:8080,fork,reuseaddr,crlf SYSTEM:"cat tmp_payload"

Here's a list of observations per common user agent, when tested against the dummy server above:

user agent filename suggested
RFC 6266 preferred ("SHOULD") bar
RFC 6266 also okay foo
axel foo; filename_=UTF-8
curl -OJ foo
wget --content-disposition bar
Chrome / Firefox bar

(axel checked at revision 3f397e3 / 2.17.13)