colinhacks / zod

TypeScript-first schema validation with static type inference
https://zod.dev
MIT License
32.82k stars 1.14k forks source link

Non-latin email addresses aren’t supported #379

Closed kripod closed 3 years ago

kripod commented 3 years ago

Inspired by this article, I decided to check whether any of the following email addresses work with zod@next’s email string validator. Unfortunately, none of the addresses below are marked as valid, even though Gmail already supports non-latin addresses like:

"josé.arrañoça"@domain
"сайт"@domain
"💩"@domain
"🍺🕺🎉"@domain
poop@💩.la
"🌮"@i❤️tacos.ws
jschauma@شبكةمايستر..شبكة
colinhacks commented 3 years ago

PR welcome, god knows I'm not gonna try to write that regex

colinhacks commented 3 years ago

Okay I dug into this and the problem isn't latin characters. They're actually fully supported by the current regex. The problem is the dotless domain. I don't think dotless domains should be supported even if they're technically valid.

But there are some other problems:

I've switched over to a new regex that solves these problems in alpha.39.


// from https://stackoverflow.com/a/46181/1550155
// old version: too slow, didn't support unicode
const emailRegexOld = /^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))$/i;
// new
const emailRegex = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i;

Though it doesn't work with the Arabic characters and I don't know why. If anyone finds a better regex, feel free to post it here. If this becomes an issue for people (seems doubtful), I can make it even more permissive:

/^\S+@\S+\.\S+$/;