Closed JiriSch closed 2 years ago
I add some more debug info, textPlain downloaded over 3.1.0:
With interpunction: ěščřžýáíé Without: escrzyaie
The same email downloaded over any version newer, for ex 4.5.1:
With interpunction: �������� Without: escrzyaie
In this attachment is the raw email from IMAP server (downloaded from IMAP server by SCP), sended by MS Outlook, recieved by Postfix, IMAP use DOVECOT: EmailTest.txt
I have fixed this issue by changing:
foreach ($elements as $element) { $newString .= $this->convertToUtf8($element->text, $element->charset); }
to:
foreach ($elements as $element) { if ($element->charset!='default') $this->decodeMimeStrDefaultCharset = $element->charset; $newString .= $this->convertToUtf8($element->text, $element->charset); }
in Mailbox.php:1490
Not sure if this is the right solution, but it works for me...
Just to explain, it looks like
$elements = \imap_mime_header_decode($string);
returns in $elements->charset default in case it's not, so this handle this from another elements...
I can confirm this bug.
Email parts with i.e. iso-8859-1 (Windows-1252) encoding are not utf-8, but iso-8859-1. The suggested patch works for me.
I looks like there has been a related merge and revert on Nov 16, 2020:
More Info in this discussion: https://github.com/barbushin/php-imap/issues/525#issuecomment-690995876
I hope this helps.
I tested it with quite a lot emails. Most iso-8859-1 emails are fixed with the above patch. However, I still found some iso-8859-1 emails, that get 'default' as charset from imap_mime_header_decode in decodeMimeStr.
Maybe it is better to fix it this way in DataPartInfo.php:111? At least this works for me with all tested emails.
protected function convertEncodingAfterFetch(): string
{
if (isset($this->charset) && !empty(\trim($this->charset))) {
$this->data = $this->mail->decodeMimeStr(
(string) $this->data // Data to convert
);
+ $this->data = $this->mail->convertToUtf8(
+ $this->data,
+ $this->charset
+ );
+ $this->charset = 'utf-8';
}
return (null === $this->data) ? '' : $this->data;
}
I tested it with quite a lot emails. Most iso-8859-1 emails are fixed with the above patch. However, I still found some iso-8859-1 emails, that get 'default' as charset from imap_mime_header_decode in decodeMimeStr.
Maybe it is better to fix it this way in DataPartInfo.php:111? At least this works for me with all tested emails.
I can confirm, this fixes encoding issues for me.
I think this library desperately needs unit tests or similar that load a set of test emails and then checks whether they encoded correctly. During the last versions, encoding issues that were already fixed reappeared multiple times.
Thanks for your code improvements, testings and feedback.
I've created a merge request with your changes, but I need to check, if we can implement it like this and if we instead can / need to remove something else as it doesn't make any sense anymore then.
Feel free to comment and provide feedback on the above linked merge request. :)
Yes, I agree, we need some unit tests, but I'm unfortunately not that familiar with writing unit tests, so I'm not really sure, how we can properly test this. Feel free to create a merge request with some or let me know, how we can define such a good unit test, then I'll add it to the above merge request.
Edit: We already have some decoding tests: https://github.com/barbushin/php-imap/blob/master/tests/unit/MailboxTest.php#L696-L731
They are may just not working as expected or we need to test this somehow else? Or we need more test cases? 🤔
Thank you for this library, I am using it for years, I was using up to version 3.1.0 but now I wanted to upgrade to 4.5 ang get problems with UTF-8 characters.
Email with UTF-8 characters - these characters are unreadable.
The used code:
I tried to test any version newer than 3.1.0 but without success, only till 3.1.0 works.
Thank you.