Open dlclark opened 4 years ago
Tagging subscribers to this area: @eerhardt, @pgovind, @jeffhandley See info in area-owners.md if you want to be subscribed.
Reproes on 3.1 and on .NET Framework 4.8: not a regression.
Repros in 5.0 too, this program does not return:
using System;
using System.Text.RegularExpressions;
static class Program
{
static void Main() => Console.WriteLine(
new Regex("(?:(?:0?)+?(?:a?)+?)?").Matches("0a").Count);
}
@am11 correct - I should have been clearer - was mainly checking this wasn't urgent, as it might be if this was a break relative to 3.1.
Here's another scenario that was reported by @paulomorgado that leads to an infinite loop (ostensibly):
Regex.Match(
"RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(System.Runtime.CompilerServices.RuntimeHelpers.TryCode,System.Runtime.CompilerServices.RuntimeHelpers.CleanupCode,System.Object)",
@"(?<=\()((?<a>\w+(\.\w+)*)(,(?<a>\w+(\.\w+)*)*)?)(?=\))",
RegexOptions.IgnoreCase)
This was repro'd using:
Since it's possible to write regexes that take arbitrary amounts of time (at least with our NFA based engine) in my mind a "hang" is interesting if it's a regression from a previous release, and/or the pattern "should" not involve pathological backtracking.
In the case immediately above, I expect there's substantial backtracking going on, because of the nested *
. The first one may just be a bug.
@veanes thoughts about the performance of the regex above in your engine?
An infinite loop is one thing but there also is a case in the opening post that generates infinite matches. That must be a bug.
In DFA mode in feature/regexsrm:
pattern: ""((?:0)+?(?:.)+?)?"" input: "0a" returns 2 matches: index 0, length 2; index 2, length 0.
pattern: "(?:(?:0?)+?(?:a?)+?)?" input: "0a" returns 2 matches: index 0, length 2; index 2, length 0.
regarding the regex @"(?<=\()((?<a>\w+(\.\w+)*)(,(?<a>\w+(\.\w+)*)*)?)(?=\))"
DFA mode does not support positive lookbehind (?<= ...)
and it does not support positive lookahead (?= ...) (as of now)
With the rewritten RegexCompiler in .NET 7, this no longer repros for either RegexOptions.Compiled or RegexGenerator. It still repros for the interpreter.
Description
Example pattern:
(?:(?:0?)+?(?:a?)+?)?
Example Input:0a
Should return 2 matches: index 0, length 2; index 2, length 0. Actually returns: infinite matches: index 0, length 2; index 1, length 1; index 1, length 1...etcOther information
The first match seems to work fine. The second match doesn't.
I did some digging--it may not be helpful, but figured I'd drop my notes in here. It appears that the
Lazybranchmark
operation pushes some data to regex stack assuming that a backtrace will happen. However, in this example the backtrace doesn't happen and the stack is left in a bad state with the extra item. So by the timeCapturemark
tries to build the match for Group 0 from the stack it has the wrong value and builds a match based on bad data. This causes our match to appear to move backwards in the string and get stuck with infinite matches.