Closed stephentoub closed 2 years ago
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Text.RegularExpressions` |
Milestone: | 7.0.0 |
Does this show up in a profile such that it's worth the special casing?
Does this show up in a profile such that it's worth the special casing?
Yes. And we also already special-case out of necessity, e.g. the analysis we do to automatically apply atomicity are disabled because atomic groups aren't supported in the DFA-based implementation today.
After discussions with @veanes and @olsaarik, it seems like in general these optimizations for backtracking are still valuable for non-backtracking, just in order to reduce the lazy construction time. So, we'll leave it all and can just disable ones on a case-by-case basis as need proves.
As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.