Sicos1977 / MSGReader

C# Outlook MSG file reader without the need for Outlook
http://sicos1977.github.io/MSGReader
MIT License
490 stars 168 forks source link

Broken Encoding when parsing Outlook .msg files #294

Closed lubko closed 2 years ago

lubko commented 2 years ago

When attached file's filename contain diacritics, example "plnění před uveřejněním.docx" the output string is broken.

I had some success overloading the "LoadStorage" function and reading the filename such as: MessageCodePage.GetString(CFStream.GetData())

InternetCodePage might work as well, but i did get some \uXXXX in the output, so i believe MessageCodePage works better. I my case the MessageCodePage has been BodyName = "iso-8859-2", CodePage = 1250, HeaderName = "windows-1250" while InternetCodePage has been BodyName = "iso-8859-2", CodePage = 28592, HeaderName = "iso-8859-2"

Curiously there was the same issue with msg.Recipients, while ie msg.Headers.Cc had user names readable just fine.

And so I assume that in the file Storage.cs case PropertyType.PT_STRING8: return GetStreamAsString(containerName, Encoding.Default); should be replaced with case PropertyType.PT_STRING8: return GetStreamAsString(containerName, MessageCodePage); where MessageCodePage is inherited from parent if null

Sicos1977 commented 2 years ago

Can you sent me the msg file? If so then please ZIP it before sending it to sicos2002@hotmail.com

dhrupdubey commented 2 years ago

@Sicos1977 Encoding is also broken for Spanish characters, it displayed ? mark in between words

Sicos1977 commented 2 years ago

Same question, can you sent me an example file? If so then please ZIP it before sending it to sicos2002@hotmail.com

Sicos1977 commented 2 years ago

No response ... closed