JoshData / python-email-validator

A robust email syntax and deliverability validation library for Python.
The Unlicense
1.11k stars 113 forks source link

Add stricter validations to the local part and the domain part #142

Closed rskvp93 closed 2 months ago

rskvp93 commented 3 months ago

While working on a project for my company, I found two potential validation issues on the email-validator module. If the application layer is unaware of these issues, it could result in security vulnerabilities.

1. After being validated by several regex patterns, the local part of an email address is normalized using the Unicode NFC algorithm

The local part is validated in the validate_email_local_part method. At the end of this method, if the local part is valid, it is normalized here. A valid email address might be transformed into an invalid one.

Here are a few specific examples:

Using an email address with specials characters like ; and ` could result in security issues:

Suggestions

The local part should be normalized at the beginning of the validate_email_local_part method. All validations should be performed after that.

2. After being decoded by the IDNA module, the domain part is not validated correctly

The problem is at this validation. We should validate the domain_i18n variable instead of the domain variable.

I don’t have specific examples to demonstrate the impact of this issue.

However, the fix is straightforward: simply change the variable name.

JoshData commented 2 months ago

@rskvp93 let me know by email about this before opening the issue and urged me to treat it as a security issue. We're back-filling this issue with our correspondence. Here was my reply:

Hi @rskvp93,

Thanks. I was afraid that this might be possible.

The backtick is a permitted character in local parts, though, so I don't think that's actually a problem (do you agree?). The semicolon case is definitely a concern. I tried to find other cases where NFC normalization changes valid characters to invalid ones (for local parts), or vice versa, with a simple script and couldn't find other cases, but I don't think the search was exhaustive so I'm not sure if other cases occur.

I think the solution is probably to validate both before and after normalization because some users likely don't use the normalized address and are just checking for an exception. I am not sure if the same issue can occur in reverse (if normalization is done first, can the original string be unsafe), but the string length may change so other validation could have a different result if the validation is done only after normalization.

I suspect that the idna package already ensures that the normalized domain is valid by failing inputs that would give invalid normalized forms, but I will think more about that.

For what it's worth, it would be fine with me if you opened an issue for this publicly, but I understand the concern.

Josh

JoshData commented 2 months ago

And hopefully everything is fixed in the commits tagged with this issue, so closing. :crossed_fingers: