Open dop251 opened 4 years ago
This one is a head-scratcher. I've confirmed it happens in the .NET regex engine that regexp2
is based on.
During the second match it seems like the regex stack is getting out of sync, so when the Capturemark
operation tries to pop the start of the capture it it should get a 2
but there's an extra 1
left over from a previous Lazybranchmark
operation. It seems the Lazybranchmark
operation is setting up the stack for a backtrack that isn't guaranteed to happen... and if it doesn't happen then the stack has extra items on it.
So, more digging is needed to figure out the best way to resolve the issue. Can Lazybranchmark
detect if there's a backtrack ahead? How often does the stack get out of sync? Can Capturemark
detect the problem and pop extra items from the stack?
I'll need to do more research and digging, but wanted to capture my research and thoughts so far.
Maybe report this to Microsoft and let them fix it 😄
Worth a shot https://github.com/dotnet/runtime/issues/43314
The following test results in an infinite loop.
$ go test -v -run TestOverlappingMatch === RUN TestOverlappingMatch TestOverlappingMatch: regexp_test.go:802: start: 0, length: 2 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 TestOverlappingMatch: regexp_test.go:802: start: 1, length: 1 ....