Rob--W / open-in-browser

A browser extension that offers the ability to open files directly in the browser instead of downloading them.
Other
88 stars 16 forks source link

Improve handling of non-ASCII filenames #26

Closed yan12125 closed 6 years ago

yan12125 commented 6 years ago

Here are some examples not working the same as vanilla Firefox.

  1. https://drive.google.com/file/d/0B7pIvhrJqP6xaGNkVldaeUpuRG8/view

The filename is 測試.txt, while open-in-browser displays __.txt. That's because RFC 6266 is not correctly implemented. The Content-Disposition line for this file is:

attachment;filename="__.txt";filename*=UTF-8''%E6%B8%AC%E8%A9%A6.txt

According to RFC 6266:

when both "filename" and "filename*" are present in a single header field value, recipients SHOULD pick "filename*" and ignore "filename".

%E6%B8%AC%E8%A9%A6.txt should be used here. That's exactly 測試.txt.

Similar bug reports and fixes:

By the way, from one of new test cases in wget's commit,

"filename**0=\"A\"; filename**1=\"A.ext\"; filename*0=\"B\";filename*1=\"B\"", "AA.ext"

I bet correctly implement RFC 6266 is not something easy.

  1. https://www.csie.ntu.edu.tw/download.php?filename=13101_7da5e585.pdf&dir=news&title=%E5%9C%8B%E7%AB%8B%E8%87%BA%E7%81%A3%E5%A4%A7%E5%AD%B8%E5%AD%B8%E7%94%9F%E9%80%95%E8%A1%8C%E4%BF%AE%E8%AE%80%E5%8D%9A%E5%A3%AB%E5%AD%B8%E4%BD%8D%E8%BE%A6%E6%B3%951060609

This website is misconfigured and return filenames in UTF-8 without quoting:

attachment; filename=國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf

If I disabled the open-in-browser extension, Firefox uses 國立臺灣大學學生逕行修讀博士學位辦法1060609.pdf as the filename, while open-in-browser says:

åç«èºç£å¤§å¸å¸çéè¡ä¿®è®å士å¸ä½è¾¦æ³1060609.pdf

That's because Firefox re-encodes the header with ISO-8859-1. I guess Firefox has some heuristic for recoding filenames back to UTF-8. In my PR for est31's version, I recode raw filenames back to UTF-8 unconditionally. I'm not sure if it's a good approach.

Rob--W commented 6 years ago

I decided to roll my own Content-Disposition parser (6f3bbb8) because the library that you suggested is incomplete. In particular it does not support RFC 2047 (which is obsolete but still supported in Firefox), and also lacks support for parameter continuations (https://github.com/jshttp/content-disposition/issues/2).

The test case that you referenced from wget is not valid either, I opened a bug for that at https://savannah.gnu.org/bugs/index.php?52531

yan12125 commented 6 years ago

Thank you very much for the parser. It's useful and easy to understand.