Webklex / php-imap

PHP-IMAP is a wrapper for common IMAP communication without the need to have the php-imap module installed / enabled. The protocol is completely integrated and therefore supports IMAP IDLE operation and the "new" oAuth authentication process as well.
https://www.php-imap.com
MIT License
301 stars 144 forks source link

Problem with subject =?UTF-8?Q? (2) #420

Open paulocardozo opened 1 year ago

paulocardozo commented 1 year ago

Describe the bug When I'm parsing the e-mail message, I'm facing this issue when trying to getSubject().

Used config I'm using default package config.

Code to Reproduce The troubling code section which produces the reported bug.

$client = Client::account('default');
$client->connect();
$inbox = $client->getFolderByPath('INBOX');
$query = $inbox->messages();
$messages = $query->where(['UNSEEN'])->from($email)->get();

foreach ($messages as $message) {    
     echo $message->getSubject();
}

Expected behavior 99% of cases this script work, but to an specific message it's returing the subject as:

=?utf-8?Q?Confirmaci=C3=B3n_reserva_Free_Tour_Flo?= =?utf-8?Q?renciaEsencial-_Buendiatours.com?=

Desktop / Server (please complete the following information):

paulocardozo commented 1 year ago

@Webklex I tried what you said but no solutions.. I'm available to explore the issue but all did I try didn't worked.

Webklex commented 1 year ago

Hi @paulocardozo, thanks a lot for reporting this issue here. I really appreciate it!

Since you are using a windows environment, I'm intriqued if it's related to #413. If you try to access the text body of a message - do you actually receive it or are the headers included as well?

Additionally, please donate an anonymized version of the troubling mail. This will allow me to create a dedicated test case for this issue. anonymized = remove all personal information you don't want to share with the world :)

Once again, thanks for taking the time and effort to make this library better!

Best regards and happy coding,

paulocardozo commented 1 year ago

@Webklex I can provide you in private

paulocardozo commented 1 year ago

@Webklex It seems be a local problem.. I'm creating from scratch an application that verify constantly a mailbox, in the older version of application, your packege is working as well, but on newer not.. I'll further investigate here and post as soon as I have news.

Thanks!

paulocardozo commented 1 year ago

@Webklex I just noticed, the same code works on unix server (Hostinger), but isn't working locally on Windows. It's seems very strange to me, but still investigating..

Webklex commented 1 year ago

Hi @paulocardozo , thanks for the followups. This sounds indeed interesting - but what could it possibly be? Perhaps different default php mods?

paulocardozo commented 1 year ago

@Webklex I really dont know.. I've checked all php.ini from versions installed here and nothing seems wrong..

It's the old version, working..

Working (Old) - ExtractBookingEmailsJob.zip

It's the newest version, not working..

Not Working (Newest) - ExtractTourBookingsEmailsJob.zip

Webklex commented 1 year ago

Honestly, I have no idea right now.. If anything regarding this pops in my mind I'll let you know for sure. Thanks again for your help!

paulocardozo commented 1 year ago

@Webklex Well, as I'm very delayed in a project, I've tested the solution above, it's related to https://github.com/Webklex/php-imap/issues/410#issuecomment-1608876508 and it worked.

`private static function decodeSubject($subject) { $parts = preg_match_all("/(=\?[^\?]+\?[BQ]\?)([^\?]+)(\?=)[\r\n\t ]*/i", $subject, $m);

    $joined_parts = '';
    if (count($m[1]) > 1 && !empty($m[2])) {
        // Example: GyRCQGlNVTtZRTkhIT4uTlMbKEI=
        $joined_parts = $m[1][0].implode('', $m[2]).$m[3][0];

        $subject_decoded = iconv_mime_decode($joined_parts, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, "UTF-8");

        if ($subject_decoded && trim($subject_decoded) != trim(rtrim($joined_parts, '='))) {
            return $subject_decoded;
        }
    }

    // iconv_mime_decode() can't decode:
    // =?iso-2022-jp?B?IBskQiFaSEcyPDpuQC4wTU1qIVs3Mkp2JSIlLyU3JSItahsoQg==?=
    $subject_decoded = iconv_mime_decode($subject, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, "UTF-8");

    // Sometimes iconv_mime_decode() can't decode some parts of the subject:
    // =?iso-2022-jp?B?IBskQiFaSEcyPDpuQC4wTU1qIVs3Mkp2JSIlLyU3JSItahsoQg==?=
    // =?iso-2022-jp?B?GyRCQGlNVTtZRTkhIT4uTlMbKEI=?=
    if (preg_match_all("/=\?[^\?]+\?[BQ]\?/i", $subject_decoded)) {
        $subject_decoded = \imap_utf8($subject);
    }

    if (!$subject_decoded) {
        $subject_decoded = $subject;
    }

    return $subject_decoded;

}`
freescout-helpdesk commented 1 year ago

FYI. In our project we've completely replaced $this->decode($header->subject) function with the one we've developed (see https://github.com/Webklex/php-imap/issues/410) because current solution ($this->decode()) often is not able to decode subject properly: https://github.com/freescout-helpdesk/freescout/blob/dist/overrides/webklex/php-imap/src/Header.php#L208

And it works like a charm now. So we have not seen any subject which this function could not decode.

daniel89fg commented 11 months ago

I have version 5.5 and the MailHelper library does not exist. Why? I have the same problem, some email subjects don't go well.

paulocardozo commented 11 months ago

I have created a function to parse subject:

`private static function decodeSubject($subject) {

    $parts = preg_match_all("/(=\?[^\?]+\?[BQ]\?)([^\?]+)(\?=)[\r\n\t ]*/i", $subject, $m);

    $joined_parts = '';
    if (count($m[1]) > 1 && !empty($m[2])) {
        // Example: GyRCQGlNVTtZRTkhIT4uTlMbKEI=
        $joined_parts = $m[1][0] . implode('', $m[2]) . $m[3][0];

        $subject_decoded = iconv_mime_decode($joined_parts, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, "UTF-8");

        if ($subject_decoded && trim($subject_decoded) != trim(rtrim($joined_parts, '='))) {
            return $subject_decoded;
        }
    }

    // iconv_mime_decode() can't decode:
    // =?iso-2022-jp?B?IBskQiFaSEcyPDpuQC4wTU1qIVs3Mkp2JSIlLyU3JSItahsoQg==?=
    $subject_decoded = iconv_mime_decode($subject, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, "UTF-8");

    // Sometimes iconv_mime_decode() can't decode some parts of the subject:
    // =?iso-2022-jp?B?IBskQiFaSEcyPDpuQC4wTU1qIVs3Mkp2JSIlLyU3JSItahsoQg==?=
    // =?iso-2022-jp?B?GyRCQGlNVTtZRTkhIT4uTlMbKEI=?=
    if (preg_match_all("/=\?[^\?]+\?[BQ]\?/i", $subject_decoded)) {
        $subject_decoded = \imap_utf8($subject);
    }

    if (!$subject_decoded) {
        $subject_decoded = $subject;
    }

    return $subject_decoded;
}`

Then I do that. image

daniel89fg commented 11 months ago

Wonderful, works perfectly, thank you very much. We'll hope the imap-php library fixes this in the future.

devlibfer commented 11 months ago

Hi guys, sometimes the $message->getSubject() method returns text in quoted-printable. How can I detect the format of the email subject?

blagi commented 10 months ago

this paulocardozo decodeSubject function id doing pretty good job with my test set of subjects, but cannot decode this one:

=?UTF-8?B?VGlja2V0IE5vOiBb7aC97bOpMTddIE1haWxib3ggSW5ib3ggLSAoMTcpIEluY29taW5nIGZhaWxlZCBtZXNzYWdlcw==?=

I had issue with it using some modified Roundcubemail methods (my 10 years old solution), and I had to make some modifications for this subject. Finally I got a solution that decode it to valid UTF-8 string, but it's complicated and I need a better one with Webklex/php-imap. Anybody can modify this nice paulocardozo function above to work with this subject?

freescout-helpdesk commented 10 months ago

this paulocardozo decodeSubject function id doing pretty good job with my test set of subjects, but cannot decode this one:

=?UTF-8?B?VGlja2V0IE5vOiBb7aC97bOpMTddIE1haWxib3ggSW5ib3ggLSAoMTcpIEluY29taW5nIGZhaWxlZCBtZXNzYWdlcw==?=

We've just checked this subject with the latest version of decodeSubject() function, it was decoded into:

Ticket No: [??????17] Mailbox Inbox - (17) Incoming failed messages
blagi commented 10 months ago

We've just checked this subject with the latest version of decodeSubject() function, it was decoded into:

Ticket No: [??????17] Mailbox Inbox - (17) Incoming failed messages

Great. That's correct subject. In original encoded subject there are a couple of invalid utf-8 characters and that function is replacing them with question mark.

Thanks!