Closed performanceautofiler[bot] closed 9 months ago
The range of regression is https://github.com/dotnet/runtime/compare/42e35f9a778a5aec389bf74d552a9f53b2dd6e80...4b19d67aaa0c9c286b7cb575399445828e975245. Could it be related to https://github.com/dotnet/runtime/commit/ae051b76a63539cfba2e33337580c4696cbec858 @stephentoub?
The range of regression is dotnet/runtime@42e35f9...4b19d67. Could it be related to dotnet/runtime@ae051b7 @stephentoub?
Yes, it very likely is.
cc: @MihaZupan
CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono
Could you please remind me what sort of hardware intrinsics support this configuration would have?
Since this is both arm64 and x64 regression, the intersection of supported intrinsics is quite small I believe.
If there are any generics used, then only System.Runtime.Intrinsics.Vector128
is definitely supported together with partial support of System.Numerics.Vector
. Without generics also Vector4
should be supported. Did I forget something @fanyang-mono?
@MihaZupan What are the intrinsics being used by this microbenchmark? For Vector128, Mono has full support across all platforms and all runtime flavors. For others, I will need to check. This regression is related to microbnenchmarks running with Mono LLVM AOT with a fall-back to Mono JIT. Usually, when there are generic types are passed in to methods, those methods won't be AOT'ed and will fall-back to JIT. Mono LLVM AOT has the most intrisics coverage, while JIT has less. I would need to know what has changed on the library side before vs after to know what is the cause of this regression.
The library change here is essentially replacing
span.IndexOf("tempus", StringComparison.Ordinal);
with
SearchValues<string> s_tempus = SearchValues.Create(["tempus"], StringComparison.Ordinal); // Cached
span.IndexOfAny(s_tempus);
Where the type returned by SearchValues.Create
is some variation of SingleStringSearchValuesThreeChars<TValueLength, TCaseSensitivity>
with different generic type params. For the affected benchmarks it would be <ValueLength4To7, CaseSensitive>
or <ValueLength4To7, CaseInsensitiveAsciiLetters>
.
The implementation of the previously used helper is here - a non-generic static method on a non-generic static class. The implementation of the new helper is here - an instance method on a generic class.
Both implementations only use Vector128/256/512<ushort>
APIs, nothing platform-specific explicitly.
(If it matters, note that the calling code in question here is generated by reflection emit at runtime)
Comparing to the old helper, the new helper has two function calls, which I suspect is the cause of the perf regression. The two function calls are
ValidateReadPosition
is [Conditional("DEBUG")]
.
GetComparisonResult
should always be inlined into the search loop, is that not happening here?
Usually, when methods are marked with AggressiveInlining
, they should be inlined. Since there is a perf regression when switching to this new helper, I suspect that the inlining might failed here. Will check the generated code and report back.
I took a look at AOT output of System.Private.CoreLib.dll
, which is System.Buffers.dll.dylib
on mac OS. I could see that SpanHelpers.IndexOf
was AOT compiled, while SingleStringSearchValuesThreeChars.IndexOf
wasn't. Additionally, I noticed that none of the methods belong to class SingleStringSearchValuesThreeChars
were AOT compiled. This is probably due to the fact that SingleStringSearchValuesThreeChars
is a generic class.
In conclusion, the perf regression here is caused by transitioning from AOT compiled code to JIT'ed code. And there isn't much that we could do to change this from the Mono runtime perspective.
I took a look at AOT output of
System.Private.CoreLib.dll
, which isSystem.Buffers.dll.dylib
on mac OS. I could see thatSpanHelpers.IndexOf
was AOT compiled, whileSingleStringSearchValuesThreeChars.IndexOf
wasn't. Additionally, I noticed that none of the methods belong to classSingleStringSearchValuesThreeChars
were AOT compiled. This is probably due to the fact thatSingleStringSearchValuesThreeChars
is a generic class.In conclusion, the perf regression here is caused by transitioning from AOT compiled code to JIT'ed code. And there isn't much that we could do to change this from the Mono runtime perspective.
@fanyang-mono can we close this and the related issues or do we want to keep them for reference?
@matouskozak Feel free to close this issue, as there isn't any action that we could take at the moment, from Mono's perspective.
Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Repro Steps
#### Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order)) - Libraries build extracted to `runtime/artifacts` or build instructions: [Libraries README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/libraries/README.md) args: `-subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0` - CoreCLR product build extracted to `runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release`, build instructions: [CoreCLR README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/README.md) args: `-subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0` - AOT MONO build extracted to `runtime/artifacts/bin/mono/$RunOS.$RunArch.Release`, build instructions: [MONO README](https://github.com/dotnet/runtime/blob/main/docs/workflow/building/mono/README.md) args: `-arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false` - Dotnet SDK installed for dotnet commands - Running commands from the runtime folder Linux ```cmd # Set $RunDir to the runtime directory RunDir=`pwd` # Set the OS, arch, and OSId RunOS='linux' RunOSId='linux' RunArch='x64' # Create aot directory mkdir -p $RunDir/artifacts/bin/aot/sgen mkdir -p $RunDir/artifacts/bin/aot/pack cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack # Create Core Root $RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release # Clone performance git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance # One line run: python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog" # Individual Commands: # Restore dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Build dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Run dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Text.RegularExpressions.Tests.Perf_Regex_Common* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200 ``` Windows ```cmd # Set $RunDir to the runtime directory $RunDir="FullPathHere" # Set the OS, arch, and OSId RunOS='windows' RunOSId='win' RunArch='x64' # Create aot directory mkdir $RunDir\artifacts\bin\aot\sgen mkdir $RunDir\artifacts\bin\aot\pack xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y # Create Core Root $RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release # Clone performance git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance # One line run: python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog" # Individual Commands: # Restore dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Build dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 # Run dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Text.RegularExpressions.Tests.Perf_Regex_Common* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200 ```