Webklex / php-imap

PHP-IMAP is a wrapper for common IMAP communication without the need to have the php-imap module installed / enabled. The protocol is completely integrated and therefore supports IMAP IDLE operation and the "new" oAuth authentication process as well.
https://www.php-imap.com
MIT License
302 stars 144 forks source link

Issue when parsing header if the from field contains a semicolon #300

Open nilshellerhoff opened 1 year ago

nilshellerhoff commented 1 year ago

Describe the bug When parsing the header of an email where the from field contains a semicolon ";", the from field will not be parsed correctly. Minimal example of such an email:

To: somemail@domain.tld
Subject: Test of a semicolon in from-header
Date: Wed, 12 Oct 2022 16:31:06 +0000
From: "Foo; Bar" <foobar@domain.tld>
Message-ID: <asdf@domain.tld>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

<< email body >>

The raw email is appended to avoid issues with linebreaks (I changed the extension to .txt as Github doesn't support .eml). semicolon_test.txt

Used config Default.

Code to Reproduce

$raw_mail = file_get_contents('semicolon_test.txt');
$header = new \Webklex\PHPIMAP\Header($raw_mail);
var_dump($header->get('from'));

Output when running this via php test.php:

PHP Warning:  Trying to access array offset on value of type null in ***/vendor/webklex/php-imap/src/Header.php on line 457
PHP Warning:  Trying to access array offset on value of type null in ***/vendor/webklex/php-imap/src/Header.php on line 457
object(Webklex\PHPIMAP\Attribute)#18 (2) {
  ["name":protected]=>
  string(4) "from"
  ["values":protected]=>
  array(1) {
    [0]=>
    string(4) ""Foo"
  }
}

Expected behavior When we remove the semicolon from the from-header, we get the expected result:

PHP Warning:  Trying to access array offset on value of type null in ***/vendor/webklex/php-imap/src/Header.php on line 457
PHP Warning:  Trying to access array offset on value of type null in ***/vendor/webklex/php-imap/src/Header.php on line 457
object(Webklex\PHPIMAP\Attribute)#6 (2) {
  ["name":protected]=>
  string(4) "from"
  ["values":protected]=>
  array(1) {
    [0]=>
    object(Webklex\PHPIMAP\Address)#7 (5) {
      ["personal"]=>
      string(9) ""Foo Bar""
      ["mailbox"]=>
      string(6) "foobar"
      ["host"]=>
      string(10) "domain.tld"
      ["mail"]=>
      string(17) "foobar@domain.tld"
      ["full"]=>
      string(29) ""Foo Bar" <foobar@domain.tld>"
    }
  }
}

Screenshots If applicable, add screenshots to help explain your problem.

Desktop / Server (please complete the following information):

EDIT: I am not actually sure that a semicolon in the header fields confroms to the spec, but Gmail, Thunderbird and also phpmailer do handle these mails correctly.

ojgarciab commented 1 year ago

I have the same problem with subjects containing a semicolon. The subject content is removed after the semicolon.

Sample:

Subject: This is an example; for example

Results in:

echo $message->subject;
This is an example
nilshellerhoff commented 1 year ago

The problem originates here:

https://github.com/Webklex/php-imap/blob/45843e1554cc280c738278b9e3b6af35a91f8b1f/src/Header.php#L654-L686

Im not very versed in email handling, but when reading this (altough not an authoritative source ofcourse) it seems to me, that certain fields including subject ... should maybe be excluded from extension parsing, and additionally to checking for semicolons, the parser should actually only parse a field, if it finds a key=value pair after the semicolon.

Can you maybe comment on this @Webklex? I can also do a PR otherwise in the weekend.

abhilashpa39 commented 3 months ago

I have the same problem with subjects containing a semicolon. The subject content is removed after the semicolon. Subject : Test; ticket <>