dlclark / regexp2

A full-featured regex engine in pure Go based on the .NET engine
MIT License
997 stars 84 forks source link

Infinite match loop #34

Open dop251 opened 4 years ago

dop251 commented 4 years ago

The following test results in an infinite loop.

func TestOverlappingMatch(t *testing.T) {
    re := MustCompile(`((?:0*)+?(?:.*)+?)?`, 0)
    match, err := re.FindStringMatch("0\xfd")
    if err != nil {
        t.Fatal(err)
    }
    for match != nil {
        t.Logf("start: %d, length: %d", match.Index, match.Length)
        match, err = re.FindNextMatch(match)
        if err != nil {
            t.Fatal(err)
        }
    }
}

$ go test -v -run TestOverlappingMatch === RUN TestOverlappingMatch TestOverlappingMatch: regexp_test.go:802: start: 0, length: 2 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 ....

dlclark commented 4 years ago

This one is a head-scratcher. I've confirmed it happens in the .NET regex engine that regexp2 is based on.

During the second match it seems like the regex stack is getting out of sync, so when the Capturemark operation tries to pop the start of the capture it it should get a 2 but there's an extra 1 left over from a previous Lazybranchmark operation. It seems the Lazybranchmark operation is setting up the stack for a backtrack that isn't guaranteed to happen... and if it doesn't happen then the stack has extra items on it.

So, more digging is needed to figure out the best way to resolve the issue. Can Lazybranchmark detect if there's a backtrack ahead? How often does the stack get out of sync? Can Capturemark detect the problem and pop extra items from the stack?

I'll need to do more research and digging, but wanted to capture my research and thoughts so far.

dop251 commented 4 years ago

Maybe report this to Microsoft and let them fix it 😄

dlclark commented 4 years ago

Worth a shot https://github.com/dotnet/runtime/issues/43314