Closed ThorstenEngel closed 1 year ago
https://github.com/mailgun/flanker can do that (and you could combine it with this library). (We link to flanker at the top of our README.)
I think parsing display names could be a useful addition.
Thanks, this helped!
Flanker's lack of maintenance (and dependency on unmaintained packages) is beginning to break in modern versions of Python (3.13 specifically.)
For extracting the email from the display name you can use Python's built-in email.utils.parseaddr
.
Great suggestion. 😀
I was thinking of replacing flanker with parseaddr in the recommendation in the README, but I see the parseaddr is a little flaky with edge cases. Just from a minute of playing I see it drops parts of the input it doesn't like:
email.utils.parseaddr("Test <@x>")
('Test', '')
>>> email.utils.parseaddr("Test <a@xx>, X <b@b>")
('Test', 'a@xx')
So it's not something I would necessarily recommend to use with a strict validation tool like this library.
Flanker doesn't accept Test <@x>
, and from Test <a@xx>, X <b@b>
it extracts b@b
.
For my specific use, I don't care much about those cases, so I think it's a good enough solution :)
Fair point !
Hi, Just wanted to let you know I just moved from v2.1.2 to the current git branch to test out the allow_display_name option and ran into a difference with it from the previouly mentioned workarounds.
i've been using the email.utils.parseaddr
and then just sending the email portion to email_validator, but email.utils.parseaddr
works with email address like sigma@pair.com (Kevin Martin)
whereas email_validator raises the exception EmailSyntaxError: The part after the @-sign contains invalid characters: '(', ')', SPACE.
I know you may not want to handle emails in this format, but thought the difference should be documented somewhere.
thanks for all you do!
>>> import email.utils
>>> import email_validator
>>> from flanker.addresslib import address
>>>
>>> #parseaddr
>>> s = "sigma@pair.com (Kevin Martin)"
>>> email.utils.parseaddr(s)
('Kevin Martin', 'sigma@pair.com')
>>>
>>> #flanker
>>> address.parse(s).address
'sigma@pair.com'
>>>
>>> #email_validator
>>> email_validator.validate_email(s, allow_display_name = True, check_deliverability = False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python311\Lib\site-packages\email_validator\validate_email.py", line 124, in validate_email
domain_name_info = validate_email_domain_name(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\email_validator\syntax.py", line 441, in validate_email_domain_name
raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".")
email_validator.exceptions_types.EmailSyntaxError: The part after the @-sign contains invalid characters: '(', ')', SPACE.
>>> s = "sigma@pair.com (Kevin Martin)"
Huh. What I implemented follows RFC 2822's name <email>
format:
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
display-name = phrase
I'm not sure what the source of a email (name)
format is. Is it commonly used?
I'm not sure what the source of a email (name) format is. Is it commonly used?
It might just be a qmail or older postfix or freebsd thing.
I don't see it often, but when I do, and if it came from a message, it normally also has a Received header from either qmail or postfix.
here is one I just happened to have handy: Received: by six.pairlist.net (Postfix, from userid 0) id CC6D26ED5C
And I used to rent a freebsd server and whenever I would use their mailing list functions, my outgoing emails would look like that as well. (but I don't know if they also used postfix or qmail to send them)
It's not super common, but common enough that I would have to work around the execptions, so I am prob going to just stick with email.utils.parseaddr. (I know it also has weird edgecase behavior, but it's mishandling of edgecases hasn't effected my dataset in a meaningful way yet)
Hmm, just stumbled across this (from 2014): https://wordtothewise.com/2014/12/friendly-email-addresses/ " parentheses isn't really a display name at all, rather it's a human readable comment. "
I thought I saw some other mention around here about ignoring comments in ()'s I've never seen anything other than a name or mailing list name in the parens.
We recently had an e-Mail with the display name "TIERE (gemeinnütziger Verein) Max Müller". As it contains Umlaute and Brackets, it did not work with email.utils.formataddr
(it did not add the necessary paranthesis). So I rewrote my code successfuly to replace formataddr((friendlyname, r_mail))
with
from email.headerregistry import Address
fullmail = str(Address(display_name=friendlyname, addr_spec=r_mail))
It looked to me as if email.headerreagistry
is better maintained than email.utils
. getaddresses
worked in my cases.
" parentheses isn't really a display name at all, rather it's a human readable comment. "
Ahha! That makes sense. Comments came up in #77. As fun as it has been to implement display names, I probably am not going to get motivated to support comments.
It looked to me as if email.headerreagistry is better maintained than email.utils.
Good to know!
Hi,
in my use-case I need to validate the syntax of e-mails with display name. I think, https://www.rfc-editor.org/rfc/rfc5322#section-3.4 fully allows addresses like "
John Doe <john@example.com>
" or in my case something like "ACME Corp. <no-reply@acme.com>
".I did not find a way to verify these addresses with yours or any other library. It would be great if your library could validate this too ;-).
Warm regards, thorsten