Cretezy / linkify

Low-level link (text, URLs, emails) parsing library in Dart
https://pub.dartlang.org/packages/linkify
MIT License
62 stars 50 forks source link

Email RegExp should be simpler. #6

Closed komapeb closed 3 years ago

komapeb commented 5 years ago

Currently, the only way to know if an email address is valid is to send an email address (and potentially wait for an action, like clicking a link to validate, etc.)

tl;dr

Please use this RegExp:

r'.+@.+'

(No need for mailto: too)

Boring stuff below

Some examples for perfectly valid email addresses:

criscrisaaaa@gmail.com.es
mminighin@alpenite.com
!mminighin@alpenite.com
#mminighin@alpenite.com
$mminighin@alpenite.com
%mminighin@alpenite.com
&mminighin@alpenite.com
'mminighin@alpenite.com
*mminighin@alpenite.com
+mminighin@alpenite.com
-mminighin@alpenite.com
/mminighin@alpenite.com
=mminighin@alpenite.com
?mminighin@alpenite.com
^mminighin@alpenite.com
_mminighin@alpenite.com
`mminighin@alpenite.com
{mminighin@alpenite.com
|mminighin@alpenite.com
}mminighin@alpenite.com
~mminighin@alpenite.com
0mminighin@alpenite.com
1mminighin@alpenite.com
2mminighin@alpenite.com
3mminighin@alpenite.com
4mminighin@alpenite.com
5mminighin@alpenite.com
6mminighin@alpenite.com
7mminighin@alpenite.com
8mminighin@alpenite.com
9mminighin@alpenite.com
10mminighin@alpenite.com
prettyandsimple@example.com
very.common@example.com
someuser@ai.
disposable.style.email.with+symbol@example.com
other.email-with-dash@example.com
fully-qualified-domain@example.com
user.name+tag+sorting@example.com
x@example.com
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
example-indeed@strange-example.com
admin@mailserver1
#!$%&'*+-/=?^_`{}|~@example.org
"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"@example.org
example@s.solutions
user@localserver
user@[2001:DB8::1]

These addresses above are all valid!

If you really, really want to be kinda (covers 99.99% of the cases) compatible with some of the RFCs, you can use this RegExp (I use it in production, but lately considering to drop it):

r'^([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22))*\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d))*\.?$'

More info:

Cretezy commented 5 years ago

I agree with you. I will simplify the regex in the next release

komapeb commented 5 years ago

Cool, thank you! Just a simple note - on a second thought, mailto: should be included to prevent accidental link conversions. Or better yet, maybe let users of the package specify their own RegExp and just provide defaults for convenience. Something like:

List<LinkifyElement> linkify(
  String text, {
  bool humanize,
  List<LinkType> linkTypes,
  RegExp urlRegex,
  RegExp emailRegex,
}) {
  ...
}

Then just check if the arguments are non-null and assign where applicable.

Cretezy commented 5 years ago

Yeah, I'm slowly working on implementing custom regexes. Haven't had much time to dedicate to this project recently, I'll try to get to it soon!

Cretezy commented 4 years ago

Custom linkifiers are out now! You can replace the whole email parser if you'd like.

rayliverified commented 4 years ago

I ran into an issue with the email regex included in this library a while ago so I had to run my own regex. I don't remember exactly what the issue was but I think it was interfering with the URL regex. There were some cases where the email was ignored and only the domain portion after the "@" was parsed.

Here's the Regex I'm using that works well for me. const emailPattern = r"\b[\w\.-]+@[\w\.-]+\.\w{2,4}\b";

devxpy commented 3 years ago

Just ran an edge case too - For us@dara.network, only us@dara.netw gets linkified.

Cretezy commented 3 years ago

@devxpy https://github.com/Cretezy/linkify/pull/36 was just merged with a fix for this. Will be included in the next release.

If anyone wants more flexible email parsing, please open a PR (I'm quite limited on time).