Closed stephentoub closed 2 years ago
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Text.RegularExpressions`, `tenet-performance` |
Milestone: | 7.0.0 |
We have an optimization that inserts an "UpdateBumpalong" node into the graph after a starting char loop: https://github.com/dotnet/runtime/blob/12a8819eee9865eb38bca6c05fdece1053102854/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs#L373-L384 The code handling this node currently sets base.runtextpos to the current position, the idea being that the normal bumpalong of +1 would otherwise end up just processing the same position we've already tried. However, it looks like our current logic for handling this is flawed, in particular for non-atomic greedy loops. The current logic (in all engines) does the equivalent of:
but when backtracking kicks in for a greedy loop, we'll end up setting base.runtextpos to the last match position, and then again to that position -1, and then again to that position -2, all the way back to the beginning, which means it's not actually buying us anything in this case (in contrast, for atomic loops where there isn't backtracking, it will successfully track the last matching position). It seems like the logic should actually be more like: