Open gnosygnu opened 4 years ago
I have been doing some experimentation on the website https://regex101.com/
This contains a version (not the latest) of PCRE
my test regex is
^[a-z][a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?[a-z]?$
That's the equivalent of ^[a-z]{1,17}$
If I start with a test string that is less than 18 characters, everything is fine
However on the 18th character, the regex engine reports an error catastrophic backtracking
Another way to demonstrate this is to start with the regex ^[a-z][a-z]?$
and a test string of 18 characters
With this starting example, the number of step shown is 5
Adding [a-z]?
to the regex so it looks like ^[a-z][a-z]?[a-z]?$
gives 9 steps
Progressively adding [a-z]?
to the regex, the step count shows as
17, 33, 65, 129, 257, 513, 1025, 2049, 4097, 8193, 16385, 32769, 65537
Note a doubling of steps for each addition.
This regex engine stops (or gives an error) above 65537
I think that the PHP implementation of PCRE follows in this vain - in other words it counts the steps and when it reaches too many, fails (so no everlasting loop)
Perhaps some sort of counting could be introduced to the java implementation?
Interesting! Thanks for the site link!
However on the 18th character, the regex engine reports an error catastrophic backtracking
I gave it a try now, and see the catastrophic backtracking
error as well
Note a doubling of steps for each addition.
Yup. Nice way to confirm the theory
Perhaps some sort of counting could be introduced to the java implementation?
Let me think about it. The change would have to occur in the LuaJ pattern matching. There may be a way to handle simple cases like ^[a-z]{1,17}$
but any variation would be difficult (like ^[a-z]{1,10}[a-m][a-p][a-z]{1,5}$
Thanks!
This issue is related to #737.
en.wikipedia.org/wiki/Lausanne_Conservatory
never loads b/c of a very expensive regex operation.SPAD_-_Section_professionnelle_d'art_dramatique_du_Conservatoire_de_Lausanne,_Lausanne_VD
is passed toen.wikipedia.org/wiki/Module:Authority_control
which will calltlsLink
on something like this regex:\p{Lu}[\p{L}\d_',\.\-\(\)\*/–]{3,59}
At the moment, I can only see 3 ways to "fix" this. Listing from most ideal, to least ideal:
The last option looks like the
best"least worst" way to go. Unfortunately, my JNI experience is limited and my C++ even more so. This could take a while...