[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled)

performanceautofiler[bot] commented 2 years ago

Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	10e107debae28de4bd1e710cfe448be3c293e841
Compare	9962c10813de695782baa20f1bf0bb2e9810971d
Diff	Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
[Uri_IsNotMatch - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_Windows 10.0.19041/System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options%3a%20IgnoreCase%2c%20Compiled).html>)	315.14 ns	557.11 ns	1.77	0.20	False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled) ```log ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

DrewScoggins commented 2 years ago

Seems related to https://github.com/dotnet/runtime/pull/61490. This is the only regression we detected, but it is fairly large (80%)

DrewScoggins commented 2 years ago

@stephentoub

stephentoub commented 2 years ago

This is the only regression we detected, but it is fairly large (80%)

I'm actually pleased it's the only one detected.

I expected and accepted this one in exchange for gains elsewhere. Though looking at it again there's probably somethings we can do to improve it.

stephentoub commented 2 years ago

The issue here actually isn't what I thought it was.

The core of the issue is: https://github.com/dotnet/runtime/blob/6ff57f1935d1cb3396674bec26268a301e5c0628/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs#L1828-L1832 This test has this sequence:

[\w]+://

Previously we'd use this CanBeMadeAtomic to check whether : can match \w, find it can't, and thus turn the \w+ loop into an atomic one. However, the use of IgnoreCase in this test is causing an issue. The cited PR optimizes the "://" to be case-sensitive, even in the presence of IgnoreCase, because none of the characters there participate in case conversion. And as a result, CanBeMadeAtomic bails early when it sees that the case-sensitivity of the two nodes differs. So the regression is because the \w+ stays non-atomic and thus causes us to backtrack more than we did previously.

The good news is the problem will just go away with https://github.com/dotnet/runtime/issues/61048. There are workarounds we could put in place in the interim, but I'm tempted to say we should just prioritize fully addressing that issue asap rather than trying to patch things like this.

@joperezr, thoughts?

joperezr commented 2 years ago

Let me make sure I understand correctly. The reason why fixing issue #61048 would fix this is because now [\w]+ would now also become case-sensitive even in IgnoreCase because at construction time we would have already translated [\w] into it's upper-case lower-case equivalent (like transforming [A] => [Aa]) so in theory CanBeMadeAtomic should now return true and reduce again the number of backtracking we do, is that right? If so, I agree it is better to just wait for that to be fixed than trying to undo, or patch the optimizations made with your PR.

stephentoub commented 2 years ago

@joperezr, exactly. (For this one specific case, it's further interesting because we recognize \w as being case-sensitive even under IgnoreCase, because the categories it maps to already includes both upper and lower. But because of how the parser works, even though [\w] and \w are 100% identical sets, the case-sensitive \w gets added to the set and [\w] ends up as case-insensitive.)

ghost commented 2 years ago

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

Issue Details

### Run Information Architecture | arm64 -- | -- OS | Windows 10.0.19041 Baseline | [10e107debae28de4bd1e710cfe448be3c293e841](https://github.com/dotnet/runtime/commit/10e107debae28de4bd1e710cfe448be3c293e841) Compare | [9962c10813de695782baa20f1bf0bb2e9810971d](https://github.com/dotnet/runtime/commit/9962c10813de695782baa20f1bf0bb2e9810971d) Diff | [Diff](https://github.com/dotnet/runtime/compare/10e107debae28de4bd1e710cfe448be3c293e841...9962c10813de695782baa20f1bf0bb2e9810971d) ### Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [Uri_IsNotMatch - Duration of single invocation]() | 315.14 ns | 557.11 ns | 1.77 | 0.20 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' ```

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled) ```log ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`arch-arm64`, `area-System.Text.RegularExpressions`, `os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`
Milestone:	-

dotnet / runtime