Closed LankWang17 closed 1 year ago
Sorry, the bug string is: Pattern pattern( "Head\\b|" "\\[System\\.Net\\.WebRequest\\]::" , "i");
The html editor makes "\\" to "\".
Tried the pattern on MacOS with reflex 3.3.1 and I didn't see anything unusual. Not hanging in the pattern construction. Did you use the pattern to match anything? What OS are you using and what compiler?
Note that the default POSIX matcher does not support \b
in the middle of a regex pattern. Only at the begin and/or end of a (sub) pattern. Use a Perl matcher to match \b
anywhere. It's OK in your case, since it's at the end of the sub pattern, just before the |
. (Just to let you know.)
My env: Windows 10, VS2017, Win32-Debug mode. No defined HAVE_AVX2 and HAVE_AVX512BW.
I used reflex project to test. It hung also. Just like this.
This helps to I can test with MSVC++ in x86 mode tomorrow. Strange if there is somehow a difference with other OS. Or perhaps it's a 32 bit build problem.
OK. Thank you.
I see the same thing. It is the combination of pattern option "i"
with a word boundary marker \b
. This exponentially grows the DFA to 2,622,046 nodes and 3,146,356 edges (!) I will further investigate the cause of this. This may take a bit of time to work on. A POSIX DFA can exponentially grow in the worst case for certain regex patterns that are unlikely to be used in practice and that should not happen in this case.
This works OK as a temporary workaround, placing a \b
after the second pattern:
Pattern pattern(
"Head\\b|"
"\\[System\\.Net\\.WebRequest\\]::\\b"
, "i");
This works OK as a temporary workaround, placing a
\b
after the second pattern:Pattern pattern( "Head\\b|" "\\[System\\.Net\\.WebRequest\\]::\\b" , "i");
OK, I'll try it.
I found a subtle bug in the DFA constructor. The bug is that a lot of unused DFA fragments get generated. This specific regex pattern triggers this bug. There is nothing intrinsically wrong with the pattern. It should not explode the DFA in size. Will fix it and release an update after testing.
The bug got introduced in 3.7.0 as an optimization to speed up DFA construction, but the optimization did not work correctly with option "I"
for case-insensitive matching in combination with other patterns.
Fixed in release 3.3.2.
Thanks for reporting this performance issue. It is very useful to receive feedback. We test the software a lot. Nevertheless, it can happen that something does not work perfectly.
Such a pattern string will cause Reflex to hung. Pattern pattern( "Head\b|" "\[System\.Net\.WebRequest\]::" , "i");
If changed "Head\b|" to "Head|", Reflex works well. it seems strange.