Closed rskvp93 closed 5 months ago
@rskvp93 let me know by email about this before opening the issue and urged me to treat it as a security issue. We're back-filling this issue with our correspondence. Here was my reply:
Hi @rskvp93,
Thanks. I was afraid that this might be possible.
The backtick is a permitted character in local parts, though, so I don't think that's actually a problem (do you agree?). The semicolon case is definitely a concern. I tried to find other cases where NFC normalization changes valid characters to invalid ones (for local parts), or vice versa, with a simple script and couldn't find other cases, but I don't think the search was exhaustive so I'm not sure if other cases occur.
I think the solution is probably to validate both before and after normalization because some users likely don't use the normalized address and are just checking for an exception. I am not sure if the same issue can occur in reverse (if normalization is done first, can the original string be unsafe), but the string length may change so other validation could have a different result if the validation is done only after normalization.
I suspect that the idna package already ensures that the normalized domain is valid by failing inputs that would give invalid normalized forms, but I will think more about that.
For what it's worth, it would be fine with me if you opened an issue for this publicly, but I understand the concern.
Josh
And hopefully everything is fixed in the commits tagged with this issue, so closing. :crossed_fingers:
While working on a project for my company, I found two potential validation issues on the email-validator module. If the application layer is unaware of these issues, it could result in security vulnerabilities.
1. After being validated by several regex patterns, the local part of an email address is normalized using the Unicode NFC algorithm
The local part is validated in the validate_email_local_part method. At the end of this method, if the local part is valid, it is normalized here. A valid email address might be transformed into an invalid one.
Here are a few specific examples:
victim+test\u037eattacker@gmail.com
→victim+test;attacker@gmail.com
: The character \u037e is normalized to ;victim+test\u1fefattacker@gmail.com
→victim+test`attacker@gmail.com
: The character \u1fef is normalized to `Using an email address with specials characters like ; and ` could result in security issues:
victim+test;attacker@gmail.com
as a list of email addresses, causing them to send an email toattacker@gmail.com
Suggestions
The local part should be normalized at the beginning of the validate_email_local_part method. All validations should be performed after that.
2. After being decoded by the IDNA module, the domain part is not validated correctly
The problem is at this validation. We should validate the
domain_i18n
variable instead of thedomain
variable.I don’t have specific examples to demonstrate the impact of this issue.
However, the fix is straightforward: simply change the variable name.