anymail / django-anymail

Django email backends and webhooks for Amazon SES, Brevo (Sendinblue), MailerSend, Mailgun, Mailjet, Postmark, Postal, Resend, SendGrid, SparkPost, Unisender Go and more
https://anymail.dev
BSD 3-Clause "New" or "Revised" License
1.65k stars 125 forks source link

parse_address_list: Inability to Parse Email Addresses with Special Characters in Display Name Without Double-Quote Encapsulation #340

Closed zyangchye closed 10 months ago

zyangchye commented 11 months ago

Reporting an error? It's helpful to know:

Issue Description

While handling email addresses with display names containing special characters, we have encountered an issue. It's possible that users might not include double quotes around the display name, which leads to a failure in the getaddresses parse and results in an error.

Related code : https://github.com/anymail/django-anymail/blob/0ac248254e3d16e3bad839ba147152bd59acbb6a/anymail/inbound.py#L134 https://github.com/anymail/django-anymail/blob/b4e22c63b38452386746fed19d5defe0797d76a0/anymail/utils.py#L119

Proposed Solution

We would love to propose the following fix:

    address_list_strings = [
        f'"{match.group(1).strip()}"<{match.group(2)}>' if (
            match := re.match(r'^(?![\s"]*")([^<]+)<([^>]+)>',
                              force_str(address))) else address
        for address in address_list
    ]

In this way we encapsulate the display name with double quotes if it is not encapsulated

medmunds commented 11 months ago

Could I confirm that this report is about receiving email with Anymail's inbound webhooks? (Not about sending email from your app.)

It sounds like you are receiving inbound email with a malformed[^1] From or To address header field, and getting an AnymailInvalidAddress error when trying to access the inbound message from_email or to.

I appreciate that you included a proposed workaround, but parse_address_list is shared with Anymail's sending code, so we can't try to fix things there. (Anymail avoids working around bugs in the caller's code, because that would cause problems for anyone who wants to switch from Anymail to some other Django email backend.)

Unfortunately, I don't think there is a reliable workaround, even if we could find a way to isolate it to Anymail's inbound handling. Parsing email address headers with regular expressions is notoriously difficult. (For example, how would your proposed code handle a To header with multiple comma-separated email addresses? Or a display name with non-ASCII characters, which is encoded using RFC-2047 and breaks if you try to enclose it in quotes? Or backslash escapes like \" which are permitted inside quoted strings? Etc.)

The malformed header is not the responsibility of the end user that sent the mail, but it is a bug in whatever app or web service they used to send it. The standard for quoting display names in email addresses is 40+ years old,[^2] and nearly all popular email apps do it correctly—at least for the basic example you reported. (But many spam senders don't. Does the problem email seem like spam?)

If this email was sent from a commonly used email app or web service, please provide more details so we can take a closer look (and/or report the problem to that app). Otherwise, I'm inclined to close this as "not planned."

[^1]: RFC-5322 requires double quotes around an address display-name with special characters: see section 3.4 "Address Specifications," and sections 3.2.3–3.2.5 for the rules that mandate double quotes around a phrase that contains any characters other than atext.

[^2]: RFC-5322's predecessor RFC-822 included the requirements for double quotes, back in 1982.