dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.3k stars 4.74k forks source link

Regex generator with large pattern scalability, RyuJIT vs. NativeAOT #72730

Open am11 opened 2 years ago

am11 commented 2 years ago

Using RyuJIT, when expression count exceeds certain threshold, the atomic comparisons tend to get slower with generated regex objects compared to interpreted ones. NativeAOT, however, maintains the "better than interpreted" characteristic of generated regex.

Here is a benchmark project using a huge x86 assembly keywords regex (taken from sharplap frontend app) and comparing with a list of all keywords multiple times: https://gist.github.com/am11/1bc5abf9560dfa3e3e9e401cbc5fe59d.

runtime version: aafa91036e

NativeAOT (expected)

Method Mean Error StdDev
Generated 31.82 ms 0.599 ms 0.914 ms
Interpreter 43.88 ms 0.823 ms 0.845 ms

RyuJIT (unexpected?)

Method Mean Error StdDev
Generated 98.93 ms 1.677 ms 1.487 ms
Interpreter 37.21 ms 0.566 ms 0.530 ms

category:implementation theme:ready-to-run skill-level:intermediate cost:medium impact:medium

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions See info in area-owners.md if you want to be subscribed.

Issue Details
Using RyuJIT, when expression count exceeds certain threshold, the atomic comparisons tend to get slower with generated regex objects compared to interpreted ones. NativeAOT, however, maintains the "better than interpreted" characteristic of generated regex. Here is a benchmark project using a huge x86 assembly keywords regex (taken from [sharplap frontend app](https://github.com/ashmind/SharpLab/blob/78bf776cce366b9b69470141922cd0a6becea8ac/source/WebApp/app/shared/codemirror/mode-asm-instructions.ts)) and comparing with a list of all keywords multiple times: https://gist.github.com/am11/1bc5abf9560dfa3e3e9e401cbc5fe59d. runtime version: aafa91036e #### NativeAOT (expected) | Method | Mean | Error | StdDev | |------------ |---------:|---------:|---------:| | Generated | 31.82 ms | 0.599 ms | 0.914 ms | | Interpreter | 43.88 ms | 0.823 ms | 0.845 ms | #### RyuJIT (unexpected?) | Method | Mean | Error | StdDev | |------------ |---------:|---------:|---------:| | Generated | 98.93 ms | 1.677 ms | 1.487 ms | | Interpreter | 37.21 ms | 0.566 ms | 0.530 ms |
Author: am11
Assignees: -
Labels: `area-System.Text.RegularExpressions`, `untriaged`
Milestone: -
stephentoub commented 2 years ago

The regex generator and RegexOptions.Compiled emit one large matching method; the larger the pattern, the larger the method. This is an obscenely large pattern 😉 . I'd guess the resulting method is so large the JIT ends up disabling important optimizations like inlining.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
Using RyuJIT, when expression count exceeds certain threshold, the atomic comparisons tend to get slower with generated regex objects compared to interpreted ones. NativeAOT, however, maintains the "better than interpreted" characteristic of generated regex. Here is a benchmark project using a huge x86 assembly keywords regex (taken from [sharplap frontend app](https://github.com/ashmind/SharpLab/blob/78bf776cce366b9b69470141922cd0a6becea8ac/source/WebApp/app/shared/codemirror/mode-asm-instructions.ts)) and comparing with a list of all keywords multiple times: https://gist.github.com/am11/1bc5abf9560dfa3e3e9e401cbc5fe59d. runtime version: aafa91036e #### NativeAOT (expected) | Method | Mean | Error | StdDev | |------------ |---------:|---------:|---------:| | Generated | 31.82 ms | 0.599 ms | 0.914 ms | | Interpreter | 43.88 ms | 0.823 ms | 0.845 ms | #### RyuJIT (unexpected?) | Method | Mean | Error | StdDev | |------------ |---------:|---------:|---------:| | Generated | 98.93 ms | 1.677 ms | 1.487 ms | | Interpreter | 37.21 ms | 0.566 ms | 0.530 ms |
Author: am11
Assignees: -
Labels: `area-CodeGen-coreclr`, `untriaged`
Milestone: -
am11 commented 2 years ago

Thanks. Yes this is an unusual pattern. :sweat_smile:

Perhaps there are different large method thresholds for JIT and R2R/AOT modes, which can be relaxed for methods with System.CodeDom.Compiler.GeneratedCodeAttribute? Alternatively, generator can split the matching method based on current runtime's large method limit. :thought_balloon:

AndyAyersMS commented 2 years ago

I'd guess the resulting method is so large the JIT ends up disabling important optimizations like inlining.

Indeed, the jit doesn't even try optimizing.

Compiling 1054 Runner::TryMatchAtCurrentPosition, IL size = 100905, hash=0xb193db74 Tier-0 switched MinOpts

There are different AOT/R2R limits, because time isn't quite so precious.

AndyAyersMS commented 2 years ago

Not sure there's anything we can do on the codegen side to mitigate this.

AndyAyersMS commented 2 years ago

cc @dotnet/jit-contrib

tannergooding commented 2 years ago

Would it be overly difficult for the generator to do some method splitting to help avoid such large sequences of IL?

EgorBo commented 2 years ago

Just curious - does it happen with RFC822 regex to verify email addresses? http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

stephentoub commented 2 years ago

Would it be overly difficult for the generator to do some method splitting to help avoid such large sequences of IL?

It'd be fairly challenging I expect, and certainly won't happen for .NET 7. For RegexOptions.Compiled, there's an added complexity of dealing with DynamicMethods. But the biggest challenge, for both RegexOptions.Compiled and the source generator, beyond coming up with the right heuristic, would be dealing with all of the gotos used to implement backtracking. Those that remain within a sub-method would remain gotos, but those that needed to jump to outside of the new method boundary would need an alternate mechanism.