dotnet / perf-autofiling-issues

A landing place for auto-filed performance issues before they receive triage
MIT License
9 stars 4 forks source link

[Perf] Windows/x64: 11 Regressions on 1/11/2024 10:37:40 PM #27331

Open performanceautofiler[bot] opened 8 months ago

performanceautofiler[bot] commented 8 months ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_HashCode

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.97 μs 2.62 μs 1.33 0.23 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_HashCode*'
### Payloads [Baseline]() [Compare]() ### System.Tests.Perf_HashCode.Add #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Perf_Frozen<Int16>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
15.34 ns 17.72 ns 1.15 0.10 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Perf_Frozen&lt;Int16&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.Perf_Frozen<Int16>.TryGetValue_True(Count: 4) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
48.79 ns 59.33 ns 1.22 0.29 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Runtime.Intrinsics.Tests.Perf_Vector128Of&lt;Byte&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.GetHashCodeBenchmark #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IndexerSet<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
224.13 μs 243.37 μs 1.09 0.09 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IndexerSet&lt;String&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.IndexerSet<String>.SortedList(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
48.82 ns 52.98 ns 1.09 0.39 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Runtime.Intrinsics.Tests.Perf_Vector128Of&lt;SByte&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.GetHashCodeBenchmark #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Runtime.Intrinsics.Tests.Perf_Vector128Int

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
10.36 ns 12.54 ns 1.21 0.47 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Runtime.Intrinsics.Tests.Perf_Vector128Int*'
### Payloads [Baseline]() [Compare]() ### System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GetHashCodeBenchmark #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
2.91 μs 3.23 μs 1.11 0.04 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;String&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Collections.IterateForEach<String>.ConcurrentStack(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Char>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.35 ns 7.01 ns 1.31 0.10 False
3.97 ns 4.98 ns 1.26 0.16 False
9.36 ns 21.47 ns 2.29 0.34 False

graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Char&gt;*'
### Payloads [Baseline]() [Compare]() ### System.Memory.Span<Char>.BinarySearch(Size: 33) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.SequenceEqual(Size: 33) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Char>.Fill(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 8f79b66e76081559f71969988037790d3e53367e
Compare 22ba7d607bb1d9caa0db9afcdc47eb5cef641fcb
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_String

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
116.39 ns 159.35 ns 1.37 0.02 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_String*'
### Payloads [Baseline]() [Compare]() ### System.Tests.Perf_String.IndexerCheckPathLength #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
jakobbotsch commented 8 months ago

System.Tests.Perf_HashCode.Add

Hot functions:

System.Collections.Perf_Frozen(Int16).TryGetValue_True(Count: 4)

Error occured

System.Runtime.Intrinsics.Tests.Perf_Vector128Of(Byte).GetHashCodeBenchmark

Hot functions:

System.Collections.IndexerSet(String).SortedList(Size: 512)

Hot functions:

Diffs ### ``[System.Private.CoreLib]System.Collections.Generic.GenericArraySortHelper`1[System.__Canon].BinarySearch(!0[],int32,int32,!0,class System.Collections.Generic.IComparer`1)`` ```diff ; optimized using Dynamic PGO ; rbp based frame ; fully interruptible -; with Dynamic PGO: edge weights are valid, and fgCalledCount is 28264 +; with Dynamic PGO: edge weights are valid, and fgCalledCount is 29500 ; 1 inlinees with PGO data; 0 single block inlinees; 0 inlinees without PGO data ; Final local variable assignments ; -; V00 this [V00,T09] ( 5, 10.89) ref -> [rbp+0x10] this class-hnd EH-live single-def -; V01 arg1 [V01,T08] ( 5, 17.87) ref -> rbx class-hnd single-def +; V00 this [V00,T09] ( 5, 10.92) ref -> [rbp+0x10] this class-hnd EH-live single-def +; V01 arg1 [V01,T08] ( 5, 17.85) ref -> rbx class-hnd single-def ; V02 arg2 [V02,T12] ( 4, 3 ) int -> rdi single-def ; V03 arg3 [V03,T13] ( 4, 3 ) int -> r14 single-def ; V04 arg4 [V04,T10] ( 3, 7.93) ref -> rsi class-hnd single-def -; V05 arg5 [V05,T16] ( 3, 2.00) ref -> r15 class-hnd single-def -; V06 loc0 [V06,T17] ( 4, 2 ) int -> [rbp-0x44] do-not-enreg[Z] EH-live +; V05 arg5 [V05,T18] ( 3, 2.00) ref -> r15 class-hnd single-def +; V06 loc0 [V06,T16] ( 4, 2 ) int -> [rbp-0x44] do-not-enreg[Z] EH-live ;* V07 loc1 [V07 ] ( 0, 0 ) ref -> zero-ref class-hnd <> ; V08 OutArgs [V08 ] ( 1, 1 ) struct (48) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V09 tmp1 [V09 ] ( 0, 0 ) long -> zero-ref "spilling helperCall" ;* V10 tmp2 [V10 ] ( 0, 0 ) long -> zero-ref "spilling helperCall" ; V11 tmp3 [V11,T24] ( 2, 0 ) ref -> rdx class-hnd single-def "impSpillSpecialSideEff" <> -; V12 tmp4 [V12,T18] ( 3, 2.00) int -> rcx "Inline return value spill temp" -; V13 tmp5 [V13,T04] ( 8, 30.29) int -> rdi "Inline stloc first use temp" -; V14 tmp6 [V14,T07] ( 5, 21.31) int -> r15 "Inline stloc first use temp" -; V15 tmp7 [V15,T03] ( 6, 31.73) int -> r14 "Inline stloc first use temp" +; V12 tmp4 [V12,T17] ( 3, 2 ) int -> registers "Inline return value spill temp" +; V13 tmp5 [V13,T04] ( 8, 30.18) int -> rdi "Inline stloc first use temp" +; V14 tmp6 [V14,T07] ( 5, 21.24) int -> r15 "Inline stloc first use temp" +; V15 tmp7 [V15,T03] ( 6, 31.66) int -> r14 "Inline stloc first use temp" ;* V16 tmp8 [V16 ] ( 0, 0 ) long -> zero-ref "spilling helperCall" -; V17 tmp9 [V17,T00] ( 4, 47.34) long -> r11 "VirtualCall with runtime lookup" -; V18 tmp10 [V18,T06] ( 4, 22.80) int -> r12 "Inline stloc first use temp" -; V19 tmp11 [V19,T20] ( 3, 0.09) int -> r12 +; V17 tmp9 [V17,T00] ( 4, 47.51) long -> r11 "VirtualCall with runtime lookup" +; V18 tmp10 [V18,T06] ( 4, 22.69) int -> r12 "Inline stloc first use temp" +; V19 tmp11 [V19,T20] ( 3, 0.02) int -> r12 ; V20 tmp12 [V20,T25] ( 2, 0 ) long -> rax "argument with side effect" ; V21 PSPSym [V21,T19] ( 1, 1 ) long -> [rbp-0x50] do-not-enreg[V] "PSPSym" -; V22 cse0 [V22,T05] ( 3, 23.76) ref -> r13 "CSE - aggressive" -; V23 rat0 [V23,T14] ( 3, 4.42) long -> rdx "Spilling to split statement for tree" -; V24 rat1 [V24,T15] ( 3, 4.01) long -> rax "runtime lookup" -; V25 rat2 [V25,T11] ( 3, 5.62) long -> rax "fgMakeTemp is creating a new local variable" -; V26 rat3 [V26,T02] ( 3, 34.72) long -> rdx "Spilling to split statement for tree" -; V27 rat4 [V27,T01] ( 3, 44.18) long -> r11 "fgMakeTemp is creating a new local variable" +; V22 cse0 [V22,T05] ( 3, 23.77) ref -> r13 "CSE - aggressive" +; V23 rat0 [V23,T14] ( 3, 4.39) long -> rdx "Spilling to split statement for tree" +; V24 rat1 [V24,T15] ( 3, 3.99) long -> rax "runtime lookup" +; V25 rat2 [V25,T11] ( 3, 5.59) long -> rax "fgMakeTemp is creating a new local variable" +; V26 rat3 [V26,T02] ( 3, 34.84) long -> rdx "Spilling to split statement for tree" +; V27 rat4 [V27,T01] ( 3, 44.34) long -> r11 "fgMakeTemp is creating a new local variable" ; V28 rat5 [V28,T21] ( 3, 0 ) long -> rdx "Spilling to split statement for tree" ; V29 rat6 [V29,T22] ( 3, 0 ) long -> rax "runtime lookup" ; V30 rat7 [V30,T23] ( 3, 0 ) long -> rax "fgMakeTemp is creating a new local variable" @@ -402,10 +402,10 @@ G_M47291_IG03: mov rax, qword ptr [rax+0x28] test rax, rax je SHORT G_M47291_IG05 - ;; size=19 bbWeight=1.00 PerfScore 9.28 + ;; size=19 bbWeight=1.00 PerfScore 9.23 G_M47291_IG04: jmp SHORT G_M47291_IG06 - ;; size=2 bbWeight=0.80 PerfScore 1.61 + ;; size=2 bbWeight=0.80 PerfScore 1.60 G_M47291_IG05: mov rcx, rdx mov rdx, 0xD1FFAB1E ; global ptr @@ -415,13 +415,14 @@ G_M47291_IG06: mov rcx, rax call [System.Collections.Generic.Comparer`1[System.__Canon]:get_Default():System.Collections.Generic.Comparer`1[System.__Canon]] cmp rax, r15 + mov rcx, gword ptr [rbp+0x10] jne G_M47291_IG25 - ;; size=18 bbWeight=1.00 PerfScore 4.52 + ;; size=22 bbWeight=1.00 PerfScore 5.49 G_M47291_IG07: lea r15d, [rdi+r14-0x01] cmp edi, r15d jg SHORT G_M47291_IG11 - ;; size=10 bbWeight=1.00 PerfScore 2.25 + ;; size=10 bbWeight=1 PerfScore 2.25 G_M47291_IG08: mov r14d, r15d sub r14d, edi @@ -429,73 +430,74 @@ G_M47291_IG08: add r14d, edi cmp r14d, dword ptr [rbx+0x08] jae G_M47291_IG24 - mov ecx, r14d - mov r13, gword ptr [rbx+8*rcx+0x10] + mov edx, r14d + mov r13, gword ptr [rbx+8*rdx+0x10] test r13, r13 je SHORT G_M47291_IG19 - ;; size=35 bbWeight=7.93 PerfScore 69.42 + ;; size=35 bbWeight=7.93 PerfScore 69.36 G_M47291_IG09: - mov rcx, gword ptr [rbp+0x10] mov rdx, qword ptr [rcx] mov rax, qword ptr [rdx+0x30] mov rax, qword ptr [rax] mov r11, qword ptr [rax+0x38] test r11, r11 je SHORT G_M47291_IG12 - ;; size=23 bbWeight=7.89 PerfScore 80.87 + ;; size=19 bbWeight=7.92 PerfScore 73.25 G_M47291_IG10: jmp SHORT G_M47291_IG13 - ;; size=2 bbWeight=6.31 PerfScore 12.62 + ;; size=2 bbWeight=6.33 PerfScore 12.67 G_M47291_IG11: - mov ecx, edi - not ecx + mov edx, edi + not edx + mov ecx, edx jmp SHORT G_M47291_IG23 - ;; size=6 bbWeight=0 PerfScore 0.00 + ;; size=8 bbWeight=0 PerfScore 0.00 G_M47291_IG12: mov rcx, rdx mov rdx, 0xD1FFAB1E ; global ptr call CORINFO_HELP_RUNTIMEHANDLE_CLASS mov r11, rax - ;; size=21 bbWeight=1.58 PerfScore 2.76 + ;; size=21 bbWeight=1.58 PerfScore 2.77 G_M47291_IG13: mov rcx, r13 mov rdx, rsi call [r11] mov r12d, eax - ;; size=12 bbWeight=7.89 PerfScore 29.59 + ;; size=12 bbWeight=7.92 PerfScore 29.69 G_M47291_IG14: test r12d, r12d je SHORT G_M47291_IG22 - ;; size=5 bbWeight=7.93 PerfScore 9.92 + ;; size=5 bbWeight=7.88 PerfScore 9.85 G_M47291_IG15: test r12d, r12d jge SHORT G_M47291_IG21 - ;; size=5 bbWeight=6.93 PerfScore 8.67 + ;; size=5 bbWeight=6.88 PerfScore 8.60 G_M47291_IG16: lea edi, [r14+0x01] - ;; size=4 bbWeight=3.49 PerfScore 1.74 + ;; size=4 bbWeight=3.45 PerfScore 1.72 G_M47291_IG17: cmp edi, r15d jg SHORT G_M47291_IG11 - ;; size=5 bbWeight=7.93 PerfScore 9.92 + ;; size=5 bbWeight=7.88 PerfScore 9.85 G_M47291_IG18: + mov rcx, gword ptr [rbp+0x10] jmp SHORT G_M47291_IG08 - ;; size=2 bbWeight=0.50 PerfScore 1.00 + ;; size=6 bbWeight=0.50 PerfScore 1.50 G_M47291_IG19: test rsi, rsi je SHORT G_M47291_IG26 mov r12d, -1 - ;; size=11 bbWeight=0.04 PerfScore 0.07 + ;; size=11 bbWeight=0.01 PerfScore 0.01 G_M47291_IG20: jmp SHORT G_M47291_IG14 - ;; size=2 bbWeight=0.04 PerfScore 0.09 + ;; size=2 bbWeight=0.01 PerfScore 0.02 G_M47291_IG21: lea r15d, [r14-0x01] jmp SHORT G_M47291_IG17 - ;; size=6 bbWeight=3.45 PerfScore 8.62 + ;; size=6 bbWeight=3.43 PerfScore 8.59 G_M47291_IG22: mov ecx, r14d - ;; size=3 bbWeight=1.00 PerfScore 0.25 + ;; size=3 bbWeight=1 PerfScore 0.25 G_M47291_IG23: mov dword ptr [rbp-0x44], ecx jmp SHORT G_M47291_IG29 @@ -504,7 +506,6 @@ G_M47291_IG24: call CORINFO_HELP_RNGCHKFAIL ;; size=5 bbWeight=0 PerfScore 0.00 G_M47291_IG25: - mov rcx, gword ptr [rbp+0x10] mov rdx, qword ptr [rcx] mov rax, qword ptr [rdx+0x30] mov rax, qword ptr [rax] @@ -512,7 +513,7 @@ G_M47291_IG25: test rax, rax je SHORT G_M47291_IG27 jmp SHORT G_M47291_IG28 - ;; size=25 bbWeight=0 PerfScore 0.00 + ;; size=21 bbWeight=0 PerfScore 0.00 G_M47291_IG26: xor r12d, r12d jmp SHORT G_M47291_IG20 @@ -581,7 +582,7 @@ G_M47291_IG33: ret ;; size=17 bbWeight=0 PerfScore 0.00 -; Total bytes of code 448, prolog size 53, PerfScore 323.28, instruction count 143, allocated bytes for code 448 (MethodHash=ce594744) for method System.Collections.Generic.GenericArraySortHelper`1[System.__Canon]:BinarySearch(System.__Canon[],int,int,System.__Canon,System.Collections.Generic.IComparer`1[System.__Canon]):int:this (Tier1) +; Total bytes of code 450, prolog size 53, PerfScore 316.98, instruction count 144, allocated bytes for code 450 (MethodHash=ce594744) for method System.Collections.Generic.GenericArraySortHelper`1[System.__Canon]:BinarySearch(System.__Canon[],int,int,System.__Canon,System.Collections.Generic.IComparer`1[System.__Canon]):int:this (Tier1) ; ============================================================ ; Assembly listing for method System.Collections.Generic.GenericArraySortHelper`1[System.__Canon]:BinarySearch(System.__Canon[],int,int,System.__Canon):int (Tier1) @@ -591,24 +592,24 @@ G_M47291_IG33: ; optimized using Dynamic PGO ; rsp based frame ; fully interruptible -; with Dynamic PGO: edge weights are valid, and fgCalledCount is 31552 +; with Dynamic PGO: edge weights are valid, and fgCalledCount is 33688 ; Final local variable assignments ; -; V00 TypeCtx [V00,T08] ( 5, 12.50) long -> rsi single-def +; V00 TypeCtx [V00,T08] ( 5, 12.48) long -> rsi single-def ; V01 arg0 [V01,T07] ( 4, 17.84) ref -> rbx class-hnd single-def ; V02 arg1 [V02,T10] ( 3, 3 ) int -> r8 single-def ; V03 arg2 [V03,T11] ( 3, 3 ) int -> r9 single-def ; V04 arg3 [V04,T09] ( 2, 7.92) ref -> rdi class-hnd single-def -; V05 loc0 [V05,T03] ( 8, 30.21) int -> rbp -; V06 loc1 [V06,T06] ( 5, 21.32) int -> r14 -; V07 loc2 [V07,T02] ( 6, 31.69) int -> r15 -; V08 loc3 [V08,T05] ( 4, 22.77) int -> r12 +; V05 loc0 [V05,T03] ( 8, 30.22) int -> rbp +; V06 loc1 [V06,T06] ( 5, 21.29) int -> r14 +; V07 loc2 [V07,T02] ( 6, 31.67) int -> r15 +; V08 loc3 [V08,T05] ( 4, 22.75) int -> r12 ; V09 OutArgs [V09 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V10 tmp1 [V10 ] ( 0, 0 ) long -> zero-ref "spilling helperCall" -; V11 tmp2 [V11,T00] ( 4, 47.49) long -> r11 "VirtualCall with runtime lookup" -; V12 tmp3 [V12,T12] ( 3, 0.01) int -> r12 -; V13 cse0 [V13,T04] ( 3, 23.76) ref -> r13 "CSE - aggressive" -; V14 rat0 [V14,T01] ( 3, 44.33) long -> r11 "fgMakeTemp is creating a new local variable" +; V11 tmp2 [V11,T00] ( 4, 47.42) long -> r11 "VirtualCall with runtime lookup" +; V12 tmp3 [V12,T12] ( 3, 0.03) int -> r12 +; V13 cse0 [V13,T04] ( 3, 23.74) ref -> r13 "CSE - aggressive" +; V14 rat0 [V14,T01] ( 3, 44.26) long -> r11 "fgMakeTemp is creating a new local variable" ; ; Lcl frame size = 40 @@ -631,7 +632,7 @@ G_M6585_IG02: mov ebp, r8d lea r14d, [rbp+r9-0x01] cmp ebp, r14d - jg G_M6585_IG20 + jg G_M6585_IG19 ;; size=17 bbWeight=1 PerfScore 3.50 G_M6585_IG03: mov r15d, r14d @@ -643,18 +644,18 @@ G_M6585_IG03: mov ecx, r15d mov r13, gword ptr [rbx+8*rcx+0x10] test r13, r13 - je SHORT G_M6585_IG11 - ;; size=35 bbWeight=7.92 PerfScore 69.32 + je SHORT G_M6585_IG13 + ;; size=35 bbWeight=7.92 PerfScore 69.28 G_M6585_IG04: mov rcx, qword ptr [rsi+0x30] mov rcx, qword ptr [rcx] mov r11, qword ptr [rcx+0x38] test r11, r11 je SHORT G_M6585_IG06 - ;; size=16 bbWeight=7.92 PerfScore 57.39 + ;; size=16 bbWeight=7.90 PerfScore 57.30 G_M6585_IG05: jmp SHORT G_M6585_IG07 - ;; size=2 bbWeight=6.33 PerfScore 12.67 + ;; size=2 bbWeight=6.32 PerfScore 12.64 G_M6585_IG06: mov rcx, rsi mov rdx, 0xD1FFAB1E ; global ptr @@ -666,37 +667,37 @@ G_M6585_IG07: mov rdx, rdi call [r11] mov r12d, eax - ;; size=12 bbWeight=7.92 PerfScore 29.68 + ;; size=12 bbWeight=7.90 PerfScore 29.64 G_M6585_IG08: test r12d, r12d je SHORT G_M6585_IG16 ;; size=5 bbWeight=7.92 PerfScore 9.90 G_M6585_IG09: test r12d, r12d - jge SHORT G_M6585_IG13 + jge SHORT G_M6585_IG15 ;; size=5 bbWeight=6.92 PerfScore 8.65 G_M6585_IG10: lea ebp, [r15+0x01] - jmp SHORT G_M6585_IG14 - ;; size=6 bbWeight=3.44 PerfScore 8.61 + ;; size=4 bbWeight=3.46 PerfScore 1.73 G_M6585_IG11: - test rdi, rdi - je SHORT G_M6585_IG19 - mov r12d, -1 - ;; size=11 bbWeight=0.01 PerfScore 0.01 -G_M6585_IG12: - jmp SHORT G_M6585_IG08 - ;; size=2 bbWeight=0.01 PerfScore 0.01 -G_M6585_IG13: - lea r14d, [r15-0x01] - ;; size=4 bbWeight=3.48 PerfScore 1.74 -G_M6585_IG14: cmp ebp, r14d - jg SHORT G_M6585_IG20 + jg SHORT G_M6585_IG19 ;; size=5 bbWeight=7.92 PerfScore 9.90 -G_M6585_IG15: +G_M6585_IG12: jmp SHORT G_M6585_IG03 ;; size=2 bbWeight=0.50 PerfScore 1.00 +G_M6585_IG13: + test rdi, rdi + je SHORT G_M6585_IG21 + mov r12d, -1 + ;; size=11 bbWeight=0.01 PerfScore 0.02 +G_M6585_IG14: + jmp SHORT G_M6585_IG08 + ;; size=2 bbWeight=0.01 PerfScore 0.03 +G_M6585_IG15: + lea r14d, [r15-0x01] + jmp SHORT G_M6585_IG11 + ;; size=6 bbWeight=3.45 PerfScore 8.63 G_M6585_IG16: mov eax, r15d ;; size=3 bbWeight=1 PerfScore 0.25 @@ -716,14 +717,10 @@ G_M6585_IG18: call CORINFO_HELP_RNGCHKFAIL ;; size=5 bbWeight=0 PerfScore 0.00 G_M6585_IG19: - xor r12d, r12d - jmp SHORT G_M6585_IG12 - ;; size=5 bbWeight=0 PerfScore 0.00 -G_M6585_IG20: mov eax, ebp not eax ;; size=4 bbWeight=0 PerfScore 0.00 -G_M6585_IG21: +G_M6585_IG20: add rsp, 40 pop rbx pop rbp @@ -735,7 +732,11 @@ G_M6585_IG21: pop r15 ret ;; size=17 bbWeight=0 PerfScore 0.00 +G_M6585_IG21: + xor r12d, r12d + jmp SHORT G_M6585_IG14 + ;; size=5 bbWeight=0 PerfScore 0.00 -; Total bytes of code 229, prolog size 35, PerfScore 254.31, instruction count 81, allocated bytes for code 229 (MethodHash=be88e646) for method System.Collections.Generic.GenericArraySortHelper`1[System.__Canon]:BinarySearch(System.__Canon[],int,int,System.__Canon):int (Tier1) +; Total bytes of code 229, prolog size 35, PerfScore 254.13, instruction count 81, allocated bytes for code 229 (MethodHash=be88e646) for method System.Collections.Generic.GenericArraySortHelper`1[System.__Canon]:BinarySearch(System.__Canon[],int,int,System.__Canon):int (Tier1) ; ============================================================ ```

System.Runtime.Intrinsics.Tests.Perf_Vector128Of(SByte).GetHashCodeBenchmark

Hot functions:

System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GetHashCodeBenchmark

Hot functions:

Diffs ### ``[System.Private.CoreLib]HashCode.Add(int32)`` ```diff ; optimized using Dynamic PGO ; rsp based frame ; partially interruptible -; with Dynamic PGO: edge weights are valid, and fgCalledCount is 379920 +; with Dynamic PGO: edge weights are valid, and fgCalledCount is 183096 ; 0 inlinees with PGO data; 5 single block inlinees; 0 inlinees without PGO data ; Final local variable assignments ; -; V00 this [V00,T00] ( 22, 8.53) byref -> rcx this single-def +; V00 this [V00,T00] ( 22, 8.48) byref -> rcx this single-def ; V01 arg1 [V01,T01] ( 6, 3 ) int -> rdx single-def ;* V02 loc0 [V02 ] ( 0, 0 ) int -> zero-ref ;* V03 loc1 [V03 ] ( 0, 0 ) int -> zero-ref -; V04 loc2 [V04,T03] ( 4, 3.25) int -> r8 -; V05 loc3 [V05,T02] ( 4, 3.25) int -> rax +; V04 loc2 [V04,T02] ( 4, 3.26) int -> r8 +; V05 loc3 [V05,T03] ( 4, 3.25) int -> rax ;# V06 OutArgs [V06 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V07 tmp1 [V07 ] ( 0, 0 ) byref -> zero-ref single-def "Inlining Arg" ; V08 tmp2 [V08,T04] ( 2, 1.00) byref -> rax single-def "Inlining Arg" @@ -271,12 +271,12 @@ G_M12347_IG02: ;; size=24 bbWeight=1 PerfScore 5.00 G_M12347_IG03: cmp r8d, 1 - je G_M12347_IG11 - ;; size=10 bbWeight=0.75 PerfScore 0.94 + je SHORT G_M12347_IG07 + ;; size=6 bbWeight=0.75 PerfScore 0.94 G_M12347_IG04: cmp r8d, 2 - jne SHORT G_M12347_IG07 - ;; size=6 bbWeight=0.50 PerfScore 0.62 + jne SHORT G_M12347_IG09 + ;; size=6 bbWeight=0.50 PerfScore 0.63 G_M12347_IG05: mov dword ptr [rcx+0x18], edx ;; size=3 bbWeight=0.25 PerfScore 0.25 @@ -284,10 +284,16 @@ G_M12347_IG06: ret ;; size=1 bbWeight=0.25 PerfScore 0.25 G_M12347_IG07: - cmp eax, 3 - jne SHORT G_M12347_IG09 - ;; size=5 bbWeight=0.25 PerfScore 0.32 + mov dword ptr [rcx+0x14], edx + ;; size=3 bbWeight=0.25 PerfScore 0.25 G_M12347_IG08: + ret + ;; size=1 bbWeight=0.25 PerfScore 0.25 +G_M12347_IG09: + cmp eax, 3 + jne SHORT G_M12347_IG11 + ;; size=5 bbWeight=0.25 PerfScore 0.31 +G_M12347_IG10: lea rax, bword ptr [rcx+0x04] lea r8, bword ptr [rcx+0x08] lea r10, bword ptr [rcx+0x0C] @@ -296,7 +302,7 @@ G_M12347_IG08: mov dword ptr [r8], 0xD1FFAB1E mov dword ptr [r10], 0xD1FFAB1E ;; size=38 bbWeight=0.25 PerfScore 1.38 -G_M12347_IG09: +G_M12347_IG11: mov eax, dword ptr [rcx] imul r8d, dword ptr [rcx+0x10], 0xD1FFAB1E add eax, r8d @@ -320,13 +326,7 @@ G_M12347_IG09: rol eax, 13 imul eax, eax, 0xD1FFAB1E mov dword ptr [rcx+0x0C], eax - ;; size=97 bbWeight=0.25 PerfScore 10.32 -G_M12347_IG10: - ret - ;; size=1 bbWeight=0.25 PerfScore 0.25 -G_M12347_IG11: - mov dword ptr [rcx+0x14], edx - ;; size=3 bbWeight=0.25 PerfScore 0.25 + ;; size=97 bbWeight=0.25 PerfScore 10.10 G_M12347_IG12: ret ;; size=1 bbWeight=0.25 PerfScore 0.25 @@ -337,6 +337,6 @@ G_M12347_IG14: ret ;; size=1 bbWeight=0.25 PerfScore 0.25 -; Total bytes of code 193, prolog size 0, PerfScore 39.62, instruction count 49, allocated bytes for code 193 (MethodHash=96edcfc4) for method System.HashCode:Add(int):this (Tier1) +; Total bytes of code 189, prolog size 0, PerfScore 39.01, instruction count 49, allocated bytes for code 189 (MethodHash=96edcfc4) for method System.HashCode:Add(int):this (Tier1) ; ============================================================ ```

System.Collections.IterateForEach(String).ConcurrentStack(Size: 512)

Hot functions:

System.Memory.Span(Char).BinarySearch(Size: 33)

Hot functions:

System.Memory.Span(Char).SequenceEqual(Size: 33)

Hot functions:

System.Memory.Span(Char).Fill(Size: 512)

Hot functions:

System.Tests.Perf_String.IndexerCheckPathLength

Hot functions: