[Perf] Linux/x64: 2 Regressions on 1/2/2024 10:16:29 PM

performanceautofiler[bot] commented 10 months ago

Run Information

Name	Value
Architecture	x64
OS	ubuntu 22.04
Queue	TigerUbuntu
Baseline	623cf77a58f7a233b94dcc1c3ef8eb8d67e8d948
Compare	653739cc0e4f6c7a44e4cda6c9ac8bb2a265505a
Diff	Diff
Configs	CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[MatchesWord - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 22.04_LLVM=true_MonoAOT=true_MonoInterpreter=false_RunKind=micro_mono/System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options%3a%20IgnoreCase%2c%20Compiled).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.83 μs	2.65 μs	1.45	0.03	True
[MatchesWord - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 22.04_LLVM=true_MonoAOT=true_MonoInterpreter=false_RunKind=micro_mono/System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options%3a%20Compiled).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.52 μs	2.21 μs	1.46	0.03	True

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

#### Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order)) - Libraries build extracted to `runtime/artifacts` or build instructions: [Libraries README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/libraries/README.md) args: `-subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0` - CoreCLR product build extracted to `runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release`, build instructions: [CoreCLR README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/README.md) args: `-subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0` - AOT MONO build extracted to `runtime/artifacts/bin/mono/$RunOS.$RunArch.Release`, build instructions: [MONO README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/mono/README.md) args: `-arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false` - Dotnet SDK installed for dotnet commands - Running commands from the runtime folder Linux ```cmd # Set $RunDir to the runtime directory RunDir=`pwd` # Set the OS, arch, and OSId RunOS='linux' RunOSId='linux' RunArch='x64' # Create aot directory mkdir -p $RunDir/artifacts/bin/aot/sgen mkdir -p $RunDir/artifacts/bin/aot/pack cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack # Create Core Root $RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release # Clone performance git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance # One line run: python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog" # Individual Commands: # Restore dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Build dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Run dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Text.RegularExpressions.Tests.Perf_Regex_Common* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200 ``` Windows ```cmd # Set $RunDir to the runtime directory $RunDir="FullPathHere" # Set the OS, arch, and OSId RunOS='windows' RunOSId='win' RunArch='x64' # Create aot directory mkdir $RunDir\artifacts\bin\aot\sgen mkdir $RunDir\artifacts\bin\aot\pack xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y # Create Core Root $RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release # Clone performance git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance # One line run: python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog" # Individual Commands: # Restore dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Build dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Run dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Text.RegularExpressions.Tests.Perf_Regex_Common* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200 ```

### Payloads [Baseline]() [Compare]() ### System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options: IgnoreCase, Compiled) #### ETL Files #### Histogram #### JIT Disasms ### System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options: Compiled) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

matouskozak commented 10 months ago

The range of regression is https://github.com/dotnet/runtime/compare/42e35f9a778a5aec389bf74d552a9f53b2dd6e80...4b19d67aaa0c9c286b7cb575399445828e975245. Could it be related to https://github.com/dotnet/runtime/commit/ae051b76a63539cfba2e33337580c4696cbec858 @stephentoub?

stephentoub commented 10 months ago

The range of regression is dotnet/runtime@42e35f9...4b19d67. Could it be related to dotnet/runtime@ae051b7 @stephentoub?

Yes, it very likely is.

cc: @MihaZupan

MihaZupan commented 10 months ago

CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Could you please remind me what sort of hardware intrinsics support this configuration would have?

matouskozak commented 10 months ago

Since this is both arm64 and x64 regression, the intersection of supported intrinsics is quite small I believe.

If there are any generics used, then only System.Runtime.Intrinsics.Vector128 is definitely supported together with partial support of System.Numerics.Vector. Without generics also Vector4 should be supported. Did I forget something @fanyang-mono?

fanyang-mono commented 10 months ago

@MihaZupan What are the intrinsics being used by this microbenchmark? For Vector128, Mono has full support across all platforms and all runtime flavors. For others, I will need to check. This regression is related to microbnenchmarks running with Mono LLVM AOT with a fall-back to Mono JIT. Usually, when there are generic types are passed in to methods, those methods won't be AOT'ed and will fall-back to JIT. Mono LLVM AOT has the most intrisics coverage, while JIT has less. I would need to know what has changed on the library side before vs after to know what is the cause of this regression.

MihaZupan commented 10 months ago

The library change here is essentially replacing

span.IndexOf("tempus", StringComparison.Ordinal);

with

SearchValues<string> s_tempus = SearchValues.Create(["tempus"], StringComparison.Ordinal); // Cached

span.IndexOfAny(s_tempus);

Where the type returned by SearchValues.Create is some variation of SingleStringSearchValuesThreeChars<TValueLength, TCaseSensitivity> with different generic type params. For the affected benchmarks it would be <ValueLength4To7, CaseSensitive> or <ValueLength4To7, CaseInsensitiveAsciiLetters>.

The implementation of the previously used helper is here - a non-generic static method on a non-generic static class. The implementation of the new helper is here - an instance method on a generic class.

Both implementations only use Vector128/256/512<ushort> APIs, nothing platform-specific explicitly.

(If it matters, note that the calling code in question here is generated by reflection emit at runtime)

fanyang-mono commented 10 months ago

Comparing to the old helper, the new helper has two function calls, which I suspect is the cause of the perf regression. The two function calls are

ValidateReadPosition
GetComparisonResult

MihaZupan commented 10 months ago

ValidateReadPosition is [Conditional("DEBUG")].

GetComparisonResult should always be inlined into the search loop, is that not happening here?

fanyang-mono commented 10 months ago

Usually, when methods are marked with AggressiveInlining, they should be inlined. Since there is a perf regression when switching to this new helper, I suspect that the inlining might failed here. Will check the generated code and report back.

fanyang-mono commented 10 months ago

I took a look at AOT output of System.Private.CoreLib.dll, which is System.Buffers.dll.dylib on mac OS. I could see that SpanHelpers.IndexOf was AOT compiled, while SingleStringSearchValuesThreeChars.IndexOf wasn't. Additionally, I noticed that none of the methods belong to class SingleStringSearchValuesThreeChars were AOT compiled. This is probably due to the fact that SingleStringSearchValuesThreeChars is a generic class.

In conclusion, the perf regression here is caused by transitioning from AOT compiled code to JIT'ed code. And there isn't much that we could do to change this from the Mono runtime perspective.

matouskozak commented 9 months ago

I took a look at AOT output of System.Private.CoreLib.dll, which is System.Buffers.dll.dylib on mac OS. I could see that SpanHelpers.IndexOf was AOT compiled, while SingleStringSearchValuesThreeChars.IndexOf wasn't. Additionally, I noticed that none of the methods belong to class SingleStringSearchValuesThreeChars were AOT compiled. This is probably due to the fact that SingleStringSearchValuesThreeChars is a generic class.

In conclusion, the perf regression here is caused by transitioning from AOT compiled code to JIT'ed code. And there isn't much that we could do to change this from the Mono runtime perspective.

@fanyang-mono can we close this and the related issues or do we want to keep them for reference?

fanyang-mono commented 9 months ago

@matouskozak Feel free to close this issue, as there isn't any action that we could take at the moment, from Mono's perspective.

dotnet / perf-autofiling-issues