fabian-hiller / valibot

The modular and type safe schema library for validating structural data 🤖
https://valibot.dev
MIT License
5.88k stars 180 forks source link

email validation: provide spec validation #204

Open kurtextrem opened 11 months ago

kurtextrem commented 11 months ago

The currently used email regex does not match emails according to the spec, which means emails that browsers accept, will be rejected by valibot (by design)

So we want to use this issue to find out if others would be interested in using a regexp that validates more emails: https://github.com/fabian-hiller/valibot/pull/180#issuecomment-1751630504

fabian-hiller commented 11 months ago

Here is some more context on this issue. Currently, our email validation is deliberately limited to "normal" email addresses. This has several advantages.

On the one hand, the validation is more secure as it excludes various special characters like ` and | which can be used for SQL injection attacks. On the other, the validation is more accurate, allowing typos in common email addresses to be detected.

The downside is that this regex differs from the standard and does not allow email addresses that use an IP address at the end, for example. If this behavior is needed, our regex function can be used as a workaround with a regex that conforms to the RFC standard.

W3C Working Draft Regex from w3.org:

/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

HTML Standard Regex from whatwg.org:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This issue was created, as described above, to get feedback to see if there is a need to add, for example, a specEmail function to match the validation of the <input type="email" /> element in the browser.

kurtextrem commented 11 months ago

Plus, the spec allows emails like:

which are both emails that are valid, but contain characters that are rarely encountered.

kazizi55 commented 11 months ago

I don’t think there is so much a need to add the specEmail() function.

In other words, I think there should be only one email validation method, email(), which implements HTML Standard Regex.

That’s because the Valibot users (I mean the programmers) would not be able to easily decide which email method to choose, since there is a little difference between email() and specEmail(), and both of them are based on different standards.

Who can predict the application users would type rarely encountered characters or not?

What do you guys think about this?

kurtextrem commented 11 months ago

Who can predict the application users would type rarely encountered characters or not?

I share the same opinion. However, I can also understand the argument of avoiding accidental security issues, as not everyone might be aware that ' or ` is an allowed character - although I'd say this is more of a teaching/docs issue as prepared statements (or stored procedures, or at very least escaping user input) should be always used in the first place.

kazizi55 commented 11 months ago

I can also understand the argument of avoiding accidental security issues

Yes, I can understand the argument too. 😄

as not everyone might be aware that ' or ` is an allowed character - although I'd say this is more of a teaching/docs

I agree with you, so I think we also have to add some explanation to the email regex doc in Valibot website, which says that ' or ` is an allowed character and would lead to accidental security issues.

In short, implementing email() with HTML Standard Regex and adding some explanation to docs are appropriate for this issue, I think.

fabian-hiller commented 11 months ago

I decided against the RFC standard for the regex of email. The reasons can be found in PR #180. I think it would be a good idea to point this out in the docs as soon as we extend the API reference.

If there are any counter arguments against my decision, feel free to create a new issue for this topic in order to get feedback from more people. This issue is meant for people to submit use cases where email is too strict, to determine in the long run if the described workaround with regex is sufficient or if we should add specEmail.