Keats / validator

Simple validation for Rust structs
MIT License
2.03k stars 147 forks source link

Bad email validator #349

Open iUnstable0 opened 2 months ago

iUnstable0 commented 2 months ago

This email validator incorrectly accepts test@ok as a valid input. While it follows the HTML specification for email address validation, the regex provided on that website also matches emails without a top-level domain (TLD), such as test@gmailcom. This is problematic for several reasons:

  1. Public-Facing Applications: For public-facing applications, accepting emails without a TLD can lead to invalid user inputs, causing potential issues with email delivery and user verification processes.

  2. Rare Usage in Development: Even in development environments, it is rare to see local email addresses without a TLD. Most development setups use valid TLDs to simulate real-world scenarios. (if im being honest they js use their personal or burner mail lol)

To address these concerns, I suggest enforcing a stricter validation that requires a TLD. This can be achieved by modifying the regex to ensure the presence of a TLD. Here is an example of a stricter regex:

/^[-a-zA-Z0-9_.]+@[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/
Keats commented 2 months ago

Honestly since this is allowed by both the mail spec (test@gmailcom is a valid email in theory) and the HTML5 spec I would prefer to leave it. However maybe we could have a strict mode that disable some esoteric addresses?

iUnstable0 commented 2 months ago

itd be nice to have it built in! since right now i'm using custom function and the crate mailchecker

szabgab commented 2 months ago

I just encountered this problem as one of my users mistakenly used blabla@gmail forgetting the .com. I'd also like to see a way to make this stricter.

GrantGryczan commented 2 months ago

Something like #[validate(email_strict)] would be nice. But with the way things typically are in Rust, I'd expect strict to be the default in order to avoid footguns, and I don't think most people expect the default to be allowing TLDs.

But regardless of what the default is, differing from the client-side validation for HTML5 email inputs arguably introduces a small problem: if the server is stricter than the client, then the user might see raw errors from the server rather than a more UX-friendly error (such as client-side validation). But I don't know that there's any way to mitigate that footgun here (other than by making a note of it in the documentation--perhaps via naming?).

Keats commented 2 months ago

strict might not be a good name. More something like allow_esoteric_mails?

szabgab commented 2 months ago

What other cases are there besides "domain name without any dot in it"?

As for name "require_one_dot_after_at" or a more ambitious one: "valid_hostname" or "valid_tld" that would check against the list of all the TLDs.

GrantGryczan commented 2 months ago

More something like allow_esoteric_mails?

The reason one would want to use it should be communicated by its name. I believe the purpose of allowing esoteric emails here is to act equivalently to how HTML5 validates emails, so I suspect something with html5 in the name would be best, if you are taking the strict-by-default route (which would be a breaking change, though not necessarily bad). Then in the docs, you'd explain what the exact differences in behavior are.

For example #[validate(email(like_html5))], but I know nothing about attribute syntax conventions, so that might not be exactly right.

Keats commented 2 months ago

The default doesn't need a name, it's the other one that is "simpler" that requires one

GrantGryczan commented 2 months ago

Oh, I thought you were suggesting a name for opting into the current default functionality, because this issue's suggested functionality is to disallow certain emails from the default, not to allow certain additional emails (which is what allow_esoteric_emails led me to believe).

In that case, perhaps something like #[validate(email(with_tld))]? Again I think it's important for the name to communicate the purpose (so, for example, I don't have to check the docs periodically due to forgetting why it's needed).

Actually, that might not be a great name because it should still allow IP addresses, right? So it's ambiguous exactly how that would behave with IP addresses. But I'd still go in the direction of making it clear from the syntax that domains in emails require a TLD.

Keats commented 2 months ago

My bad, I meant disallow_esoteric_mails, not allow_esoteric_mails 🤦

Actually, that might not be a great name because it should still allow IP addresses, right? So it's ambiguous exactly how that would behave with IP addresses. But I'd still go in the direction of making it clear from the syntax that domains in emails require a TLD.

exactly, ip addresses still work. The idea with "esoteric" is stuff that no regular user is going to encounter in reality

gianalarcon commented 1 month ago

Hi team, any updates on this issue? We are also expecting to use a validate(email_strict) or something similar. Thanks!

Keats commented 1 month ago

I would take a PR for it

GrantGryczan commented 1 month ago

Haha, I just found out that some people actually do use real, working emails with just a TLD, no dot. See this Reddit thread for examples. In light of this, I'm not sure making this a built-in validator check is the right solution. If this is implemented, it should be documented that this invalidates some real-world email addresses from top-level domain mail servers. But I think checking if the TLD is real would be a more ideal solution (albeit with the drawback that new TLDs are often created), and that would cover the user@gmailcom case.

(Though I personally ended up going with the more "Parse, don't validate" solution of validating using newtypes, so I don't have any personal investment in this anymore anyway.)