Cretezy / linkify

Low-level link (text, URLs, emails) parsing library in Dart
https://pub.dartlang.org/packages/linkify
MIT License
62 stars 50 forks source link

looseUrl option identifies text with multiple periods as a url #59

Open rutvik110 opened 9 months ago

rutvik110 commented 9 months ago

Issue:

Currently, the following patterns of text are being identified as url when looseUrl option is true when using linkify.

pattern1 -> 'awdaw....aw'
pattern2 -> 'awdaw...wad...wadw'
and so on...

Expected behaviour:

Technically, this shouldn't be identified as urls as there are multiple periods present consecutively and thus is an invalid url pattern.

rutvik110 commented 9 months ago

I can track this issue to the looseUrlRegex and the issue's arising from including . at this point in regex which allows matching for multiple periods consecutively. Removing . from this section resolves the issue.

[-a-zA-Z0-9@:%._\+~#=]{2,256}                     

Complete looseUrlRegex

r'''^(.*?)((https?:\/\/)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//="'`]*))'''