bbottema / email-rfc2822-validator

The world's only Java-based rfc2822-compliant email address validator and parser
64 stars 13 forks source link

[question] Parsing problem with '=?UTF-8?Q?Gesellschaft_f=C3=BCr_Freiheitsrechte_e=2EV=2E?= <info@freiheitsrechte.org>' #18

Open aanno opened 4 years ago

aanno commented 4 years ago

I tried the library on many mail addresses without problems but =?UTF-8?Q?Gesellschaft_f=C3=BCr_Freiheitsrechte_e=2EV=2E?= <info@freiheitsrechte.org> seems to confuse EmailAddressParser. I have no if it is valid w.r.t. rfc2822, however..

chconnor commented 4 years ago

That address looks valid to me... maybe @bbottema will have a comment about why it isn't working.

bbottema commented 4 years ago

I double checked and this address indeed doesn't pass the test. I have not clue why, somewhere the following awesomeballz regex fails to recognize this address as valid:

(((?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09
\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x
0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?[a-zA-Z0
-9!#-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~
]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)
?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[
 \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x0
1-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08
\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?"(
?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[
\t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\
x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0
C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))(?:(?:(
?:[ \t]*\r\n)?[ \t]+)(?:(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]
-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]
+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)
?[ \t]+)))?[a-zA-Z0-9!#-'*+\-/=?^-`{-~.\[]]+(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-
\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:
[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|
(?:(?:[ \t]*\r\n)?[ \t]+)))?)|(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'
*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)
?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]
*\r\n)?[ \t]+)))?"(?>(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F])))*(
?:(?:[ \t]*\r\n)?[ \t]+)?"(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-
~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+
)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?
[ \t]+)))?)))*)??((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\
[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-
\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+))
)?<((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B
\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x
0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9!
#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x
0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?
\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[
 \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F
\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t
]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(
?:[ \t]*\r\n)?[ \t]+)))?("(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x
7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!
-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\
n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \
t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]
|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?
[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[
\t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?
[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:
[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))
*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\
x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \
t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r
\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\[(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-Z^-~]|(?:\\[\x01-\
x09\x0B\x0C\x0E-\x7F]))+)*(?:(?:[ \t]*\r\n)?[ \t]+)?])(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0
B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\
((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[
\t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)>((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x
7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*
\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:
[ \t]*\r\n)?[ \t]+)))?))|(((?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]
-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]
+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)
?[ \t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\
n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:
(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F
]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x0
1-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?
[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]
*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?("(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?>[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!#-\[\]-~]|(?:\\[\
x01-\x09\x0B\x0C\x0E-\x7F])))*(?:(?:[ \t]*\r\n)?[ \t]+)?")(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x0
8\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]
+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n
)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?)@(?:(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x
0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:
(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\
))|(?:(?:[ \t]*\r\n)?[ \t]+)))?([a-zA-Z0-9!#-'*+\-/=?^-`{-~]+(?:\.[a-zA-Z0-9!#-'*+\-/=?^-`{-~]+)*)((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\(
(?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \
t]+)?\))*(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x0
9\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?|(?:(?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*
\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:
(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\
x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?(\[(?:(?:(?:[ \t]*\r\n)?[ \t]+)?(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x7
F!-Z^-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))+)*(?:(?:[ \t]*\r\n)?[ \t]+)?])((?:(?:(?:[ \t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[
\t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(?:(?:[ \t]*\r\n)?[ \t]+)?\))*(?:(?:(?:(?:[
\t]*\r\n)?[ \t]+)?\((?:(?:(?:[ \t]*\r\n)?[ \t]+)?[\x01-\x08\x0B\x0C\x0E-\x1F\x7F!-'*-\[\]-~]|(?:\\[\x01-\x09\x0B\x0C\x0E-\x7F]))*(
?:(?:[ \t]*\r\n)?[ \t]+)?\))|(?:(?:[ \t]*\r\n)?[ \t]+)))?))

:smiley:

bbottema commented 4 years ago

It validates with the default isValid() though, which is not strict RFC

So one of the following flags makes it fail:

bbottema commented 4 years ago

Ok, so ALLOW_SQUARE_BRACKETS_IN_A_TEXT breaks the validation. However, the intricate details of these regexes are beyond my knowledge. I don't understand what brackets have to do with this particular email address.

chconnor commented 4 years ago

This line has \\[], should that be \\[\\] ?

It's been ages since I was in this stuff, so just taking shots here. :-)

bbottema commented 4 years ago

Yeah I was just testing that myself, but doesn't seem to matter.

chconnor commented 4 years ago

Hmmm... seems like an uncommented ] there would really throw a wrench in... I'm surprised it didn't matter...

bbottema commented 4 years ago

Yeah, maybe the regex engine is smart enough to look for an outer bracket of character ranges.

Anyway I escaped it now and committed it as well as this particular case as a junit test, which as of now breaks the build. Perhaps you can have a look, it's ninny time for me!

Oh and I'm pulling the parent card again; a month ago I became father for the second time 😄😄

chconnor commented 4 years ago

Congrats!

For when you return -- how about the line before -- shouldn't that dot be escaped? \\.

bbottema commented 4 years ago

No, as it is already inside a character range (ie. [a-z1-9.,+*{}()]), everything is regarded as literal including operators. The only exception might be the bracket itself, which I've now escaped just to be sure (I tested it just to make sure and indeed it doesn't change the faulty outcome).

bbottema commented 4 years ago

Ok, I just found out it's not so much that the presence of ALLOW_SQUARE_BRACKETS_IN_A_TEXT causes it, but the absence of the others cause it. Using an empty criteria also causes it.

After zooming in on this, I found that leaving ALLOW_QUOTED_IDENTIFIERS out causes the problem. As soon as that is included in the criteria (in any combination), the problem goes away.

Unfortunately, the else-branch for ALLOW_SQUARE_BRACKETS_IN_A_TEXT being in the criteria covers a swath of code, so that will be more difficult to analyse right now.

if (criteria.contains(EmailAddressCriteria.ALLOW_QUOTED_IDENTIFIERS)) {
    (..) // no problems in this case
} else {
    // somewhere the following code breaks our use case

    // no quoted identifiers, yes|no domain literals
    local_part_da = m.group(3);
    if (local_part_da == null) {
        local_part_qs = m.group(4);
    }
    domain_part_da = m.group(5);
    if (domain_part_da == null && allowDomainLiterals) {
        domain_part_dl = m.group(6);
    }
    current_localpart = local_part_da == null ? local_part_qs : local_part_da;
    current_domainpart = domain_part_da == null ? domain_part_dl : domain_part_da;
    if (extractCfwsPersonalNames) {
        personal_string = m.group((allowDomainLiterals ? 1 : 0) + 6);
        personal_string = removeAnyBounding('(', ')', getFirstComment(personal_string, criteria));
    }
}
chconnor commented 4 years ago

Ok last one from me... on this line, should the ^ be \\ escaped? Given that it's used for negation inside ranges? Or does it get interpreted as a literal because it's followed by a -?

Anyway sounds like you've traced the issue elsewhere, just thought I'd check.

bbottema commented 4 years ago

From regular-expressions.info on Character Classes or Character Sets:

To include an unescaped caret as a literal, place it anywhere except right after the opening bracket. [x^] matches an x or a caret. This works with all flavors discussed in this tutorial.

So that's fine.

chconnor commented 4 years ago

Righto. Good luck with the bug!