dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Regex performance compare with other programming languages #23683

Closed doggy8088 closed 4 years ago

doggy8088 commented 7 years ago

There is a mariomka/regex-benchmark repo that run regex benchmark for different programming languages. I'm just wondered why C# Regex performance is the last and way slower than any other programming languages. Is there any way to speed up Regex performance in .NET or any reason why .NET Regex is that slow?

danmoseley commented 4 years ago

@iSazonov see https://github.com/mariomka/regex-benchmark/pull/26

danmoseley commented 4 years ago

Note that https://github.com/dotnet/runtime/pull/35824 should take this down substantially further. But still a fair bit off Rust

EgorBo commented 4 years ago

I've tested recent Mono (llvm-jit mode) vs daily .net 5 on this benchmark Mono is 30% slower than CoreCLR.

danmoseley commented 4 years ago

@EgorBo that woudl be recent Mono with all of @stephentoub improvements to the library, I assume. How does it compare with/without RegexOptions.Compiled. Are both regressed or is it specific to ref-emit?

EgorBo commented 4 years ago

@EgorBo that woudl be recent Mono with all of @stephentoub improvements to the library, I assume. How does it compare with/without RegexOptions.Compiled. Are both regressed or is it specific to ref-emit?

Yes, it's a mono built from dotnet/runtime master so it uses all the latest Regex changes. Numbers for "Compiled" mode:

coreclr:
7.1213 - 5
51.5561 - 92
54.7056 - 5301

mono-llvm-jit:
13.63 - 5
86.0223 - 92
87.768 - 5301

mono-jit:
13.1713 - 5
101.0694 - 92
103.0671 - 5301

For non-Compiled mode the difference is smaller between coreclr and mono-jit-llvm (~30%) NOTE: Mono can be faster in AOT-LLVM mode (because it runs more llvm optimizations than the LLVM in JIT mode) but it looks like it's broken at the moment, will try again once I fix it.

danmoseley commented 4 years ago

Thanks. What's your interpretation of the numbers - is this hitting an area in mono that is known to have a perf gap?

EgorBo commented 4 years ago

Thanks. What's your interpretation of the numbers - is this hitting an area in mono that is known to have a perf gap?

Not sure, had a quick profiler run (via xcode instruments) and it looks a lot of time is spent in GC.

eerhardt commented 4 years ago

We did a lot of perf work in this area in 5.0. Can this issue be closed now?

danmoseley commented 4 years ago

I think so - we're doing very substantially better on both regex-redux and the mariomka benchmark. If there's more work, it should probably be driven by a new benchmark or insights. Also, @EgorBo there's a Mono specific issue, that should have it's own issue not least because it would be different engineers working on it. (Perhaps it is important for Blazor? @marek-safar )

eerhardt commented 4 years ago

Perhaps it is important for Blazor?

One of the 3 Blazor test apps we are working with uses Regex -

https://github.com/SteveSandersonMS/CarChecker/blob/e9c7e7114e7936e38389074f1b5005e4b27a0e8e/Shared/VehiclePart.cs#L43-L50

Note that on WASM, RuntimeFeature.IsDynamicCodeCompiled always returns false, which means the Regex will never use the IL Compiler and instead will always be interpreted.

https://github.com/dotnet/runtime/blob/54a09d24142864767d50c83c2a64c0e1e37a34c2/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs#L76-L81

marek-safar commented 4 years ago

I'm not aware of any regex perf problems/asks for browser