eteran / nedit-ng

a Qt5 port of the NEdit using modern C++14
GNU General Public License v2.0
95 stars 26 forks source link

X Resource highlight patterns have "catastrophic backtracking" #151

Open eteran opened 4 years ago

eteran commented 4 years ago

The "Resource Continued" pattern of:

^(\s*[^:\s]+\s*:)(?:(\\.)|.)*(\\)\n

Effectively hangs NEdit-ng (and nedit) for some inputs. Something as simple as this causes things to hang indefinitely:

Ada:Default\n\tAwk:Default\n\tC++:Default\n\tC:Default\n\tCSS:Default\n\tCsh:Default\n\tFortran:Default\n\tJava:Default\n\tJavaScript:Default\n\tLaTeX:Default\n\tLex:Default\n\tMakefile:Default\n\tMatlab:Default\n\tNEdit Macro:Default\n\tPascal:Default\n\tPerl:Default\n\tPostScript:Default\n\tPython:Default\n\tRegex:Default\n\tSGML HTML:Default\n\tSQL:Default\n\tSh Ksh Bash:Default\n\tTcl:Default\n\tVHDL:Default\n\tVerilog:Default\n\tXML:Default\n\tX Resources:Default\n\tYacc:Default

After debugging the issue, it does technically to "make progress", but that is only a technicality. The speed down is exponential so it quickly goes from seconds to parse, to minutes, to days, to years, etc...

This seems to be a function of how the regex engine works, perhaps the best solution is to say "don't do that" and replace the regex with one that doesn't have this issue if possible.

grege2 commented 3 years ago

Interesting. I have an old regex demo that I carry around ..

time perl -e 'print "Starting runaway regex:\n"; if ( "1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25,zzz" =~ /^(.*?,){27}z/ ) { print "match" } else { print "no match" }'

As shown (1 to 25) it takes about 4 seconds on my Mac, but more numbers (26, 27 ...) blow it up pretty quickly. I forget what it's doing ! but it's a classic regex backtrack explosion.

I mostly use Perl when I need some serious regex to work, it seems to have the best regex, and the doco is great -

https://perldoc.perl.org/perlre

Ciao.