andreydelpozo2 / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Email pattern matching doesn't match every email #28

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Email addresses differ somewhat from the pattern which has been used:

    private static final Pattern MAIL_REGEX = Pattern.compile("[-_.0-9A-Za-z]{1,64}@([-_0-9A-Za-z]){1,63}(.([-_.0-9A-Za-z]{1,63}))");

In fact, coming up with a very accurate mail regex requires using a very long 
one:

  http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html

Whereas that regular expression is somewhat ridiculous, there are some things 
which could be improved without going that far:

1. Addresses permit a lot more in the local part, for instance "+".
2. Addresses can use a quoted local part like: "any string \"here\""@example.com
3. Hostnames can't actually contain an underscore.
4. International hostnames are possible although rare.  International usernames 
are not through standards yet, but are coming soon to a server near you. ;)

Original issue reported on code.google.com by trejkaz on 19 Oct 2011 at 9:55

GoogleCodeExporter commented 9 years ago
I have the same policy as mentioned at Issue 27 .

Original comment by nakatani.shuyo on 20 Oct 2011 at 6:38