Closed yan12125 closed 6 years ago
I decided to roll my own Content-Disposition parser (6f3bbb8) because the library that you suggested is incomplete. In particular it does not support RFC 2047 (which is obsolete but still supported in Firefox), and also lacks support for parameter continuations (https://github.com/jshttp/content-disposition/issues/2).
The test case that you referenced from wget is not valid either, I opened a bug for that at https://savannah.gnu.org/bugs/index.php?52531
Thank you very much for the parser. It's useful and easy to understand.
Here are some examples not working the same as vanilla Firefox.
The filename is
測試.txt
, while open-in-browser displays__.txt
. That's because RFC 6266 is not correctly implemented. The Content-Disposition line for this file is:According to RFC 6266:
%E6%B8%AC%E8%A9%A6.txt should be used here. That's exactly 測試.txt.
Similar bug reports and fixes:
By the way, from one of new test cases in wget's commit,
I bet correctly implement RFC 6266 is not something easy.
This website is misconfigured and return filenames in UTF-8 without quoting:
If I disabled the open-in-browser extension, Firefox uses
國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf
as the filename, while open-in-browser says:That's because Firefox re-encodes the header with ISO-8859-1. I guess Firefox has some heuristic for recoding filenames back to UTF-8. In my PR for est31's version, I recode raw filenames back to UTF-8 unconditionally. I'm not sure if it's a good approach.