Closed digidigital closed 1 week ago
I've gone and modified the correct section of the data so it should not escape any non-ascii characters in the header. I would test this myself to ensure it is working correctly, but I don't have any examples with this issue that I can find, and the uploaded file is a .eml file rather than a .msg file. If you could either upload the correct file or download the version on the #next-release branch and test it yourself, that would be great.
I did try to manually do a conversion using outlook, but the file is causing RTFDE to throw mysterious errors, so now I have something else to look into as well 😅
Thx for the quick fix. I tested it for html, prepared html, text, and it works as expected. 😻
Here is the msg -> test.zip
I did not notice that a .msg you send as an attachment from Outlook/Windows is saved as .eml when you get the mail in Thunderbird/Ubuntu and save the attachment 😇
Yeah, the reason it saves that way is that often attaching the msg file will actually mangle it when the email gets sent.
I'll probably look a little bit harder at some of the other things that can influence the HTML body to ensure that this issue won't come up anywhere else and then I'll publish the release.
I was right, there was still one more place that had that as a potential issue, but it's fixed too now in the 0.50.0 release.
Bug Metadata
Describe the bug If you have a two-byte character (like the German umlaut "ä") in the header data that is transmitted with =UTF8?Q? … ?= the result is two separate characters when using prepared-html. Output for text or "regular" html is fine
What code did you use or can we use to reproduce this error? Just try the attached text-email with --html --prepared-html
Is there a message.msg file you want to share to help us reproduce this?
Additional context I tried to track the issue and i assume it is caused by passing the html as UTF-8-encoded bytes to beautiful soup (message_base.py, line 385) -> I assume the two-byte character is interpreted as two separate characters by bs
the string returned by htmlInjectableHeader has the Umlauts in the correct form
def replace(bodyMarker): """ Internal function to replace the body tag with itself plus the header. """
Potential fix Decode the value passed to beatifulsoup in getSaveHtmlBody with .decode('utf-8') -> pass data as regular utf-8 string to bs