Sicos1977 / MSGReader

C# Outlook MSG file reader without the need for Outlook
http://sicos1977.github.io/MSGReader
MIT License
493 stars 168 forks source link

Bugfix for reading encoded sequence #396

Closed NagayamaToshiaki closed 9 months ago

NagayamaToshiaki commented 9 months ago

Descriptions

This branch fixes the bug that encoded words in headers such as subject or sender name that is handled improperly.

For example,

Subject: =?utf-8?B?44CQ6KaB56K66KqN44CR4peP5pmu6YCa6Ieq6Lui6LuK77yI?=
=?utf-8?B?44OO44O844OR44Oz44Kv44K/44Kk44Ok77yJIOWGjeS6iOe0hOmAmuef?=
=?utf-8?B?pSAyMDIzLzExLzI5IDEyOjAwIC0gMTM6MDA=?=

This subject consists of three UTF-8 chunks. This program currently treats each chunk of a header separately, so it is translated as "【要確認】●普通自転車(", "ノーパンクタイヤ) 再予約通�", "� 2023/11/29 12:00 - 13:00". Now, the mojibake occurs between the second and the third chunk.

This bugfix treats chunked header as a sequence, as long as chunks' charsets and encodings are the same (I suppose few mailer change charset and encoding in a row of header though. As RFC2047 does not tell whether the implementers not to do so, I chose the safest way). Thus, the former Subject is treated as:

Subject: =?utf-8?B?44CQ6KaB56K66KqN44CR4peP5pmu6YCa6Ieq6Lui6LuK77yI44OO44O844OR44Oz44Kv44K/44Kk44Ok77yJIOWGjeS6iOe0hOmAmuefpSAyMDIzLzExLzI5IDEyOjAwIC0gMTM6MDA=?=

Now, the subject is safely rendered as "【要確認】●普通自転車(ノーパンクタイヤ) 再予約通知 2023/11/29 12:00 - 13:00".

Related issues

https://github.com/Sicos1977/MSGReader/issues/383

https://github.com/Sicos1977/MSGReader/issues/390