Closed performanceautofiler[bot] closed 2 years ago
Architecture | x64 |
---|---|
OS | ubuntu 18.04 |
Baseline | 543bcc5ee7d6a2b9471b016770227421c43a756e |
Compare | f142128e89b63577a9bbba7e2b760ec82102a7a9 |
Diff | Diff |
Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|---|
[ReadSingleSegmentSequenceByN - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes%3a%208192%2c%20TestCase%3a%20Json40KB).html>) | 117.72 μs | 124.68 μs | 1.06 | 0.05 | False | |||||
[ReadSingleSegmentSequenceByN - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes%3a%204096%2c%20TestCase%3a%20Json4KB).html>) | 11.91 μs | 12.64 μs | 1.06 | 0.04 | False | |||||
[ReadSingleSegmentSequenceByN - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes%3a%208192%2c%20TestCase%3a%20Json4KB).html>) | 11.86 μs | 13.09 μs | 1.10 | 0.04 | False |
git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.Json.Tests.Perf_Segment*'
Architecture | x64 |
---|---|
OS | ubuntu 18.04 |
Baseline | 543bcc5ee7d6a2b9471b016770227421c43a756e |
Compare | f142128e89b63577a9bbba7e2b760ec82102a7a9 |
Diff | Diff |
Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|---|
[IndexOfValue - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Memory.Span(Byte).IndexOfValue(Size%3a%20512).html>) | 138.67 ns | 2.42 μs | 17.43 | 0.01 | False | |||||
[IndexOfAnyTwoValues - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Memory.Span(Byte).IndexOfAnyTwoValues(Size%3a%20512).html>) | 200.08 ns | 3.04 μs | 15.18 | 0.01 | False | |||||
[IndexOfAnyThreeValues - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Memory.Span(Byte).IndexOfAnyThreeValues(Size%3a%20512).html>) | 276.67 ns | 3.75 μs | 13.54 | 0.01 | False |
git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Memory.Span<Byte>*'
Architecture | x64 |
---|---|
OS | ubuntu 18.04 |
Baseline | 543bcc5ee7d6a2b9471b016770227421c43a756e |
Compare | f142128e89b63577a9bbba7e2b760ec82102a7a9 |
Diff | Diff |
Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|---|
[GetFileNameWithoutExtension - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.IO.Tests.Perf_Path.GetFileNameWithoutExtension.html>) | 109.03 ns | 153.38 ns | 1.41 | 0.25 | False |
git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.IO.Tests.Perf_Path*'
Architecture | x64 |
---|---|
OS | ubuntu 18.04 |
Baseline | 543bcc5ee7d6a2b9471b016770227421c43a756e |
Compare | f142128e89b63577a9bbba7e2b760ec82102a7a9 |
Diff | Diff |
Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|---|
[ImmutableArray - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu 18.04_AOT=true_CompliationMode=wasm_RunKind=micro/System.Collections.ContainsTrue(Int32).ImmutableArray(Size%3a%20512).html>) | 125.83 μs | 1.25 ms | 9.90 | 0.07 | False |
git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.ContainsTrue<Int32>*'
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to 'arch-wasm': @lewing See info in area-owners.md if you want to be subscribed.
Author: | performanceautofiler[bot] |
---|---|
Assignees: | naricc |
Labels: | `arch-wasm`, `untriaged`, `area-Codegen-AOT-mono`, `refs/heads/main`, `ubuntu 18.04`, `RunKind=micro`, `PGO`, `Regression`, `x64`, `AOT=true`, `WASM`, `CompliationMode=wasm` |
Milestone: | - |
The span indexof regressions are pretty brutal
cc @marek-safar @vargaz
@adamsitnik we think this might be related to the Vector128 changes
Also regressed the mono interpreter pretty badly https://github.com/dotnet/perf-autofiling-issues/issues/7976
cc @BrzVlad @vargaz
we can see the impact of https://github.com/dotnet/runtime/pull/73768 very clearly in other aot json tests but the span here is large in perf pipeline and the other vectorization might have caused issues as well.
The usage of generic structs like:
private static int IndexOfAnyValueType<TValue, TNegator>(ref TValue searchSpace, TValue value0, TValue value1, TValue value2, TValue value3, int length)
where TValue : struct, INumber<TValue>
where TNegator : struct, INegator<TValue>
probably causes wasm to fall back to the interpreter for these methods, since the AOT compiler cannot determine the instantiations that will be used at runtime.
Imho such code should be avoided in low level BCL code, its better to have some code duplication.
Imho such code should be avoided in low level BCL code, its better to have some code duplication.
This isn't really practical, IMO, for us or our customers who are going to do the same. We have many features that are built on and around generics and generic specialization.
Much like with Vector<T>
, Vector64<T>
, Vector128<T>
, and Vector256<T>
this really seems like something that WASM and AOT will have to handle as they are such core concepts to .NET as a whole (especially in perf oriented code).
Many of these algorithms are very long and very complex. It's not just a small amount of duplication, but large amounts of duplication in some of our most critical code (where having one place to review/update is often core/critical due to its complexity).
Notably, in the case of IndexOfAnyValueType
, these are all "internal" generic methods that are called from other methods where the type is concrete and well-known.
WASM/AOT could recognize the pattern of if (typeof(T) == typeof(...))
for value types and/or the cases where we are calling with concrete type information and either specialize or inline the methods to ensure it doesn't need to interpret (e.g. https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs,1162 has a size check and then always calls the API as IndexOfAnyValueType<byte>
).
in the mean time string handling is now slower than net6 in many cases and it happened post the rc1 branch
cc @danmoseley
Notably in WASM. Most other targets, including Arm64, have gotten significantly faster (often 2x or more).
The “best” fix for .NET 7 given that and the short timeframe is likely to ifdef for WASM or similar. I don’t think reverting it is the right choice given the gains we see on almost all our other targets.
For .NET 8 I think this really needs a more in depth look from the WASM side on how generics can be better supported. That could be one of the suggestions above or something as “simple” as some attribute that tells WASM/AOT what to specialize for in an opt-in manner. I don’t believe duplicating code or maintaining the status quo is viable, especially given the prevalence of genetics in .NET and all the newer features and existing perf patterns that encourage users to write generic and generic specialized algorithms
Historically when doing this kind of vectorization work we have assumed it's reasonable to measure only on CoreCLR x64 and Arm64 and I assume that's what @adamsitnik did. Maybe it's fine to rely on the perf lab to catch anything Mono specific - it's just that timeframes are tighter at the moment.
If #if is feasible that seems reasonable as me in the short term?
the AOT compiler cannot determine the instantiations that will be used at runtime.
@vargaz Why is the AOT compiler not able to determine the instantiations that will be used at runtime in this case?
The instantiation chain for these generic helper methods always uses a small set of concrete types. There is no ambiguity about what this will be instantiated with.
So what happens is that the AOT compiler creates a shared instantiation for bytes and enums whose base type is bytes, and that ends up being called, and then it cannot inline the call to DontNegate
This appears fixable, but not in the net7 timeframe.
@jkotas @vargaz should I port the fix to just release/7.0-rc2
branch or both main
and release/7.0-rc2
?
I think both if possible.
We should start with a fix in main and then backport it to rc2.
resolved for net7 in https://github.com/dotnet/runtime/pull/75996
There are some large wasm regressions here; as much as a 10 x slow down. Unfortunately, there seems to be some kind of glitch in the reporting, and the narrowest I can get the change range is this: https://github.com/dotnet/runtime/compare/543bcc5ee7d6a2b9471b016770227421c43a756e...a250bfe05688352da06c11cbfce33550b388a868.
@lewing @radical @pavelsavara Any likely candidates in that date range?
Automated issue reporting below.
Run Information
Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common
Test Report
Repro
Run Information
Regressions in System.Text.Json.Tests.Perf_Reader
Test Report
Repro
Run Information
Regressions in System.Tests.Perf_String
Test Report
Repro
Run Information
Regressions in System.Tests.Perf_Environment
Test Report
Repro
Run Information
Regressions in System.Memory.ReadOnlySpan
Test Report
Repro
Run Information
Regressions in System.Collections.ContainsFalse<Int32>
Test Report
Repro
Run Information
Regressions in Microsoft.Extensions.Configuration.Xml.XmlConfigurationProviderBenchmarks
Test Report
Repro