Sicos1977 / MSGReader

C# Outlook MSG file reader without the need for Outlook
http://sicos1977.github.io/MSGReader
MIT License
478 stars 168 forks source link

Text with the Arial font displays special Norwegian characters out of order #313

Closed zeOxx closed 1 year ago

zeOxx commented 1 year ago

Describe the bug Text styled with Arial can show Norwegian characters out of order. This only happens for the BodyHTML property, the other body properties display the characters in their correct order.

Examples: Text from outlook using three different fonts, with Arial in the middle: image Parsed text from msgviewer: image As you can see the "å" here gets moved from where it originally was. Here's the HTML from outlook for good measure, which has the characters in their original positions:

<p class=MsoNormal>The fånt fær this text is Calibri, which wørks fine<o:p></o:p></p>
<p class=MsoNormal><span style='font-family:"Arial",sans-serif'>The fånt for this text is Arial, whichdæsn&#8217;t wørk fine<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:Consolas'>I&#8217;ll also try another line with a different fånt,for gøod meæsure</span><span style='font-family:Consolas'><o:p></o:p></span></p>

You can also see that it sometimes doesn't move the characters for some reason

To Reproduce Steps to reproduce the behavior: Compose an email with any text that contains the characters æ, ø, å. Example text used above: The fånt for this text is Arial, which dæsn’t wørk fine and choose the Arial font

Sicos1977 commented 1 year ago

I did what you said but the outcome look fine to me

image

zeOxx commented 1 year ago

There are apparently some nuances to this... I also tried the same text and it works fine, so that was apprently a bad example by me. Apologies.

I did get it to happen again though, this time with the name of a colleague of mine. I wrote it out on three different lines, and both lines using Arial has jumbled the special characters image

This is the HTML it spits out:

<div dir="ltr"><div class="gmail_quote"><div dir="ltr"><p class="MsoNormal" style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"><b><span style="color:black">Rolf
Løvås</span></b></p>

<p class="MsoNormal" style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"><b><span style="font-family:Arial,sans-serif;color:black">Rolf Lvøås</span></b></p>

<p class="MsoNormal" style="margin:0cm;font-size:11pt;font-family:Calibri,sans-serif"><b><span style="font-size:12pt;font-family:Arial,sans-serif;color:rgb(117,117,117)">ROLF LVØÅS</span></b></p></div>
</div></div>

As you can see, the lines using Arial jumbles the special characters, while the one using Calibri doesn't. This is done using the latest nuget package, 4.4.11, reading the contents via Storage.Message through a MemoryStream.

EDIT: I should also note that, as before, the other body properties display the text correctly. This only occurs within BodyHTML

Sicos1977 commented 1 year ago

Get the latest version from nuget