dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.9k stars 4.63k forks source link

JIT: maybe reconsider multiple def CSEs #97243

Open AndyAyersMS opened 7 months ago

AndyAyersMS commented 7 months ago

Have been looking at CSE behavior and noticed some examples where multiple-def CSEs are a bit strange.

Here's one, where the two defs are in the gray blocks and the use is in the yellow. A closer look reveals that only one of the two defs can reach the use.

image (39)

This really should just be a single-def CSE.

We get to this point because CSE candidate formation (for the most part) is just keyed on the liberal VN. If two trees in the method have the same liberal VN then they belong to the same CSE candidate set. We then do a forward dataflow using these CSE trees. The only way a CSE is killed is at a join.

We then revisit the candidate member trees, and if the CSE is available at that point then the tree is marked as a use; if not, it is marked as a def. So in the example above the candidate set has 3 trees, but the BB15 tree is available at BB31 so the latter is marked as a use. The BB02 tree does not reach any uses and is also considered a def.

There are at least two issues here -- one potential and one we see in the above case:

In the case above the CSE is a class handle, and the last problem potentially leads to a lower perf score than if no CSE was done, as there is an extra callee save used (to be fair it seems plausible this would happen even without the common temp, as the path from 15->31 is live across a call).

It's not clear how often we see the case where two CSE defs reach some CSE use, and the way the algorithm is structured it's not obvious how to figure this out. Here's one half-baked idea, perhaps worth measuring sometime.

In the above case this would lead to CSE#02 being its own use-less set, so it would drop out of CSE.

category:cq theme:cse skill-level:expert cost:medium impact:medium

ghost commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
Have been looking at CSE behavior and noticed some examples where multiple-def CSEs are a bit strange. Here's one, where the two defs are in the gray blocks and the use is in the yellow. A closer look reveals that only one of the two defs can reach the use. ![image (39)](https://github.com/dotnet/runtime/assets/10121823/0aeb1398-5f78-493f-a8e0-e2ac0117c787) This really should just be a single-def CSE. We get to this point because CSE candidate formation (for the most part) is just keyed on the liberal VN. If two trees in the method have the same liberal VN then they belong to the same CSE candidate set. We then do a forward dataflow using these CSE trees. The only way a CSE is killed is at a join. We then revisit the candidate member trees, and if the CSE is available at that point then the tree is marked as a use; if not, it is marked as a def. So in the example above the candidate set has 3 trees, but the BB15 tree is available at BB31 so the latter is marked as a use. The BB02 tree does not reach any uses and is also considered a def. There are at least two issues here -- one potential and one we see in the above case: * if the CSE defs don't agree on exception sets, the entire candidate may become non-viable, if some of the uses need stronger exception guarantees than intersection of all defs provides. * each def turns into a (cse-temp def, use) pair, so none of the defs are dead, and CSE runs late enough that nothing gets rid of this temp, and there is now a common temp tying together disjoint lifetimes, which may confuse LSRA and produce worse allocation In the case above the CSE is a class handle, and the last problem potentially leads to a lower perf score than if no CSE was done, as there is an extra callee save used (to be fair it seems plausible this would happen even without the common temp, as the path from 15->31 is live across a call). It's not clear how often we see the case where two CSE defs reach some CSE use, and the way the algorithm is structured it's not obvious how to figure this out. Here's one half-baked idea, perhaps worth measuring sometime. * We do the initial CSE location like we do now, looking for multiple distinct trees with the same liberal (ish) VN. We then give each of those trees its own candidate number, and propagate availability. Note as currently constructed the algorithm is limited to 64 candidates so this limit might need to be raised. * Then, rescanning each candidate, we do a union find where the "leaders" are defs, trees that have no available expressions, and the set members are uses, trees with one or more of the leaders as available expressions (if there are multiple leaders than at least one use has to be reachable from more than one def; discovering one such is what leads to union-ing). In the above case this would lead to CSE#02 being its own use-less set, so it would drop out of CSE.
Author: AndyAyersMS
Assignees: -
Labels: `area-CodeGen-coreclr`, `needs-area-label`
Milestone: -
AndyAyersMS commented 7 months ago

Another example, this one has two two-def CSEs.

The jit currently does both, but just doing CSE 01 is better (per perf score).

Flowgraph

image - 2024-01-24T141101 140

MCMC Tree for CSEs

Second number in each node is best perf score. Double box is what the jit does now.

image - 2024-01-24T141423 396

AndyAyersMS commented 7 months ago

From a recent ASP.NET collection

This leads to a bunch of ideas / suggestions:

AndyAyersMS commented 7 months ago

I have implemented the useless def pruning and it has some nice (but small) wins. Similar things could be had by doing a late round of forward sub to reconnect the trees at these useless def sites (possibly more cheaply).

Sample diff

image

Diffs are based on 113,707 contexts (48,175 MinOpts, 65,532 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 1 (0.00%)

Diff JIT options: JitRefineCSECandidates=1

Overall (-2,698 bytes)
|Collection|Base size (bytes)|Diff size (bytes)| |---|--:|--:| |aspnet.run.windows.x64.checked.mch|43,323,348|-2,698|
FullOpts (-2,698 bytes)
|Collection|Base size (bytes)|Diff size (bytes)| |---|--:|--:| |aspnet.run.windows.x64.checked.mch|28,904,732|-2,698|
Example diffs
aspnet.run.windows.x64.checked.mch
-13 (-6.60%) : 93057.dasm - System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (FullOpts)
```diff @@ -12,8 +12,8 @@ ; V02 tmp1 [V02,T01] ( 11, 6.50) byref -> rdx single-def "field V00._reference (fldOffset=0x0)" P-INDEP ; V03 tmp2 [V03,T02] ( 8, 5 ) int -> rcx "field V00._length (fldOffset=0x8)" P-INDEP ;* V04 tmp3 [V04 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref" -; V05 cse0 [V05,T04] ( 5, 2.50) int -> r8 multi-def "CSE - aggressive" -; V06 cse1 [V06,T03] ( 7, 3.50) int -> r10 multi-def "CSE - aggressive" +; V05 cse0 [V05,T04] ( 3, 1.50) int -> r8 multi-def "CSE - aggressive" +; V06 cse1 [V06,T03] ( 5, 2.50) int -> r10 multi-def "CSE - aggressive" ; ; Lcl frame size = 40 @@ -41,7 +41,7 @@ G_M18065_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, cmp word ptr [rdx+0x04], 0 jne SHORT G_M18065_IG07 cmp ecx, 3 - jbe G_M18065_IG11 + jbe SHORT G_M18065_IG11 cmp word ptr [rdx+0x06], 0 jne SHORT G_M18065_IG07 cmp ecx, 6 @@ -56,7 +56,7 @@ G_M18065_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, je SHORT G_M18065_IG04 cmp r10d, 0xFFFF jne SHORT G_M18065_IG06 - ;; size=89 bbWeight=0.50 PerfScore 14.38 + ;; size=85 bbWeight=0.50 PerfScore 14.38 G_M18065_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref ; byrRegs -[rdx] mov eax, 1 @@ -76,16 +76,14 @@ G_M18065_IG06: ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000 G_M18065_IG07: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0004 {rdx}, byref, isz cmp ecx, 4 jbe SHORT G_M18065_IG11 - movzx r8, word ptr [rdx+0x08] - test r8d, r8d + cmp word ptr [rdx+0x08], 0 jne SHORT G_M18065_IG09 cmp ecx, 5 jbe SHORT G_M18065_IG11 - movzx r10, word ptr [rdx+0x0A] xor eax, eax - cmp r10d, 0x5EFE + cmp word ptr [rdx+0x0A], 0x5EFE sete al - ;; size=37 bbWeight=0.50 PerfScore 4.62 + ;; size=28 bbWeight=0.50 PerfScore 5.38 G_M18065_IG08: ; bbWeight=0.50, epilog, nogc, extend add rsp, 40 ret @@ -104,7 +102,7 @@ G_M18065_IG11: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 { int3 ;; size=6 bbWeight=0 PerfScore 0.00 -; Total bytes of code 197, prolog size 4, PerfScore 32.88, instruction count 57, allocated bytes for code 197 (MethodHash=3928b96e) for method System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (FullOpts) +; Total bytes of code 184, prolog size 4, PerfScore 33.62, instruction count 55, allocated bytes for code 184 (MethodHash=3928b96e) for method System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (FullOpts) ; ============================================================ Unwind Info: ```
-11 (-6.01%) : 87257.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1)
```diff @@ -41,7 +41,7 @@ ;* V29 tmp25 [V29 ] ( 0, 0 ) short -> zero-ref "Inlining Arg" ;* V30 tmp26 [V30 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ; V31 tmp27 [V31,T05] ( 5, 0.00) int -> rax "Single return block return value" -; V32 cse0 [V32,T04] ( 13, 4.73) int -> r9 hoist multi-def "CSE - aggressive" +; V32 cse0 [V32,T04] ( 11, 4.73) int -> r9 hoist multi-def "CSE - aggressive" ; ; Lcl frame size = 0 @@ -63,10 +63,10 @@ G_M21910_IG04: ; bbWeight=0.48, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, movsx r10, word ptr [rcx+2*rax] movsx r9, dx cmp r10d, r9d - je G_M21910_IG18 + je SHORT G_M21910_IG18 movsx r10, word ptr [rcx+2*rax-0x02] cmp r10d, r9d - je G_M21910_IG21 + je SHORT G_M21910_IG21 movsx r10, word ptr [rcx+2*rax-0x04] cmp r10d, r9d je SHORT G_M21910_IG20 @@ -74,7 +74,7 @@ G_M21910_IG04: ; bbWeight=0.48, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, cmp r10d, r9d je SHORT G_M21910_IG19 add rax, -4 - ;; size=63 bbWeight=0.48 PerfScore 10.51 + ;; size=55 bbWeight=0.48 PerfScore 10.51 G_M21910_IG05: ; bbWeight=0.49, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byref, isz test r8d, r8d jle SHORT G_M21910_IG08 @@ -124,9 +124,8 @@ G_M21910_IG15: ; bbWeight=0.01, epilog, nogc, extend ; gcr arg pop 0 ;; size=6 bbWeight=0.01 PerfScore 0.02 G_M21910_IG16: ; bbWeight=0.00, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, gcvars, byref - movsx r9, dx - mov edx, r9d - ;; size=7 bbWeight=0.00 PerfScore 0.00 + movsx rdx, dx + ;; size=4 bbWeight=0.00 PerfScore 0.00 G_M21910_IG17: ; bbWeight=0.00, epilog, nogc, extend tail.jmp [System.SpanHelpers:g__SimdImpl|87_0[short,System.SpanHelpers+DontNegate`1[short],System.Runtime.Intrinsics.Vector128`1[short]](byref,short,int):int] ; gcr arg pop 0 @@ -148,7 +147,7 @@ G_M21910_IG21: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, jmp SHORT G_M21910_IG18 ;; size=4 bbWeight=0 PerfScore 0.00 -; Total bytes of code 183, prolog size 0, PerfScore 23.32, instruction count 53, allocated bytes for code 183 (MethodHash=9eddaa69) for method System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) +; Total bytes of code 172, prolog size 0, PerfScore 23.32, instruction count 52, allocated bytes for code 172 (MethodHash=9eddaa69) for method System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) ; ============================================================ Unwind Info: ```
-11 (-6.01%) : 91217.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1)
```diff @@ -41,7 +41,7 @@ ;* V29 tmp25 [V29 ] ( 0, 0 ) short -> zero-ref "Inlining Arg" ;* V30 tmp26 [V30 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg" ; V31 tmp27 [V31,T05] ( 5, 0.00) int -> rax "Single return block return value" -; V32 cse0 [V32,T04] ( 13, 4.78) int -> r9 hoist multi-def "CSE - aggressive" +; V32 cse0 [V32,T04] ( 11, 4.78) int -> r9 hoist multi-def "CSE - aggressive" ; ; Lcl frame size = 0 @@ -63,10 +63,10 @@ G_M21910_IG04: ; bbWeight=0.49, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, movsx r10, word ptr [rcx+2*rax] movsx r9, dx cmp r10d, r9d - je G_M21910_IG18 + je SHORT G_M21910_IG18 movsx r10, word ptr [rcx+2*rax-0x02] cmp r10d, r9d - je G_M21910_IG21 + je SHORT G_M21910_IG21 movsx r10, word ptr [rcx+2*rax-0x04] cmp r10d, r9d je SHORT G_M21910_IG20 @@ -74,7 +74,7 @@ G_M21910_IG04: ; bbWeight=0.49, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, cmp r10d, r9d je SHORT G_M21910_IG19 add rax, -4 - ;; size=63 bbWeight=0.49 PerfScore 10.71 + ;; size=55 bbWeight=0.49 PerfScore 10.71 G_M21910_IG05: ; bbWeight=0.49, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byref, isz test r8d, r8d jle SHORT G_M21910_IG08 @@ -124,9 +124,8 @@ G_M21910_IG15: ; bbWeight=0.01, epilog, nogc, extend ; gcr arg pop 0 ;; size=6 bbWeight=0.01 PerfScore 0.02 G_M21910_IG16: ; bbWeight=0.00, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, gcvars, byref - movsx r9, dx - mov edx, r9d - ;; size=7 bbWeight=0.00 PerfScore 0.00 + movsx rdx, dx + ;; size=4 bbWeight=0.00 PerfScore 0.00 G_M21910_IG17: ; bbWeight=0.00, epilog, nogc, extend tail.jmp [System.SpanHelpers:g__SimdImpl|87_0[short,System.SpanHelpers+DontNegate`1[short],System.Runtime.Intrinsics.Vector128`1[short]](byref,short,int):int] ; gcr arg pop 0 @@ -148,7 +147,7 @@ G_M21910_IG21: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, jmp SHORT G_M21910_IG18 ;; size=4 bbWeight=0 PerfScore 0.00 -; Total bytes of code 183, prolog size 0, PerfScore 23.48, instruction count 53, allocated bytes for code 183 (MethodHash=9eddaa69) for method System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) +; Total bytes of code 172, prolog size 0, PerfScore 23.48, instruction count 52, allocated bytes for code 172 (MethodHash=9eddaa69) for method System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) ; ============================================================ Unwind Info: ```
+7 (+1.91%) : 60299.dasm - System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1)
```diff @@ -10,13 +10,13 @@ ; Final local variable assignments ; ; V00 this [V00,T09] ( 3, 3 ) ref -> r14 this class-hnd single-def -; V01 arg1 [V01,T04] ( 5, 379052.67) ref -> rdi class-hnd single-def -; V02 arg2 [V02,T03] ( 6, 758104.33) ref -> rsi class-hnd single-def -; V03 arg3 [V03,T05] ( 4, 379052.67) ref -> rbp class-hnd single-def +; V01 arg1 [V01,T03] ( 5, 379052.67) ref -> rdi class-hnd single-def +; V02 arg2 [V02,T02] ( 6, 758104.33) ref -> rsi class-hnd single-def +; V03 arg3 [V03,T04] ( 4, 379052.67) ref -> rbp class-hnd single-def ; V04 arg4 [V04,T01] ( 9,1137154.00) byref -> rbx single-def ; V05 loc0 [V05,T00] ( 13,1895255.33) int -> r15 -; V06 loc1 [V06,T06] ( 5, 379052.67) ref -> r13 class-hnd single-def -; V07 loc2 [V07 ] ( 2, 758101.33) ref -> [rsp+0x28] do-not-enreg[X] must-init addr-exposed ld-addr-op class-hnd +; V06 loc1 [V06,T05] ( 5, 379052.67) ref -> r13 class-hnd single-def +; V07 loc2 [V07 ] ( 2, 758101.33) ref -> [rsp+0x38] do-not-enreg[X] must-init addr-exposed ld-addr-op class-hnd ; V08 OutArgs [V08 ] ( 1, 1 ) struct (40) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V09 tmp1 [V09 ] ( 0, 0 ) long -> zero-ref "spilling helperCall" ;* V10 tmp2 [V10 ] ( 0, 0 ) int -> zero-ref "dup spill" @@ -24,12 +24,12 @@ ;* V12 tmp4 [V12 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg" ; V13 tmp5 [V13,T07] ( 2, 0 ) ubyte -> rax "Inline return value spill temp" ; V14 tmp6 [V14,T12] ( 2, 4 ) long -> rcx "argument with side effect" -; V15 cse0 [V15,T02] ( 6,1137154.00) int -> r12 multi-def "CSE - aggressive" +; V15 cse0 [V15,T06] ( 4, 379050.67) int -> [rsp+0x34] multi-def "CSE - aggressive" ; V16 rat0 [V16,T10] ( 3, 4.40) long -> rcx "Spilling to split statement for tree" ; V17 rat1 [V17,T11] ( 3, 4 ) long -> rcx "runtime lookup" ; V18 rat2 [V18,T08] ( 3, 5.60) long -> rax "fgMakeTemp is creating a new local variable" ; -; Lcl frame size = 56 +; Lcl frame size = 72 G_M9389_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG push r15 @@ -40,10 +40,10 @@ G_M9389_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, push rsi push rbp push rbx - sub rsp, 56 + sub rsp, 72 xor eax, eax - mov qword ptr [rsp+0x28], rax - mov qword ptr [rsp+0x30], rcx + mov qword ptr [rsp+0x38], rax + mov qword ptr [rsp+0x40], rcx mov r14, rcx ; gcrRegs +[r14] mov rdi, rdx @@ -52,7 +52,7 @@ G_M9389_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, ; gcrRegs +[rsi] mov rbp, r9 ; gcrRegs +[rbp] - mov rbx, bword ptr [rsp+0xA0] + mov rbx, bword ptr [rsp+0xB0] ; byrRegs +[rbx] ;; size=48 bbWeight=1 PerfScore 12.50 G_M9389_IG02: ; bbWeight=1, gcrefRegs=40E0 {rbp rsi rdi r14}, byrefRegs=0008 {rbx}, byref @@ -81,20 +81,20 @@ G_M9389_IG04: ; bbWeight=1, gcrefRegs=40E0 {rbp rsi rdi r14}, byrefRegs=0 ;; size=23 bbWeight=1 PerfScore 7.50 G_M9389_IG05: ; bbWeight=379051.67, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, byrefRegs=0008 {rbx}, byref, isz ; gcrRegs -[rax] - mov r12d, dword ptr [rsi+0x08] - cmp r12d, r15d + cmp dword ptr [rsi+0x08], r15d jle SHORT G_M9389_IG08 - ;; size=9 bbWeight=379051.67 PerfScore 1231917.92 + ;; size=6 bbWeight=379051.67 PerfScore 1516206.67 G_M9389_IG06: ; bbWeight=379050.67, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, byrefRegs=0008 {rbx}, byref + mov r12d, dword ptr [rsp+0x34] cmp r15d, r12d jae G_M9389_IG17 mov r8d, r15d mov r8, gword ptr [rsi+8*r8+0x10] ; gcrRegs +[r8] - mov gword ptr [rsp+0x28], r8 + mov gword ptr [rsp+0x38], r8 mov bword ptr [rsp+0x20], rbx ; byr arg write - lea r8, [rsp+0x28] + lea r8, [rsp+0x38] ; gcrRegs -[r8] mov rcx, r13 ; gcrRegs +[rcx] @@ -110,18 +110,19 @@ G_M9389_IG06: ; bbWeight=379050.67, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, mov byte ptr [rbx+0x86], 0 cmp dword ptr [rbx+0x40], 0 jg G_M9389_IG18 - ;; size=72 bbWeight=379050.67 PerfScore 6064810.67 + ;; size=77 bbWeight=379050.67 PerfScore 6443861.33 G_M9389_IG07: ; bbWeight=379050.67, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, byrefRegs=0008 {rbx}, byref, isz inc r15d + mov dword ptr [rsp+0x34], r12d jmp SHORT G_M9389_IG05 - ;; size=5 bbWeight=379050.67 PerfScore 852864.00 + ;; size=10 bbWeight=379050.67 PerfScore 1231914.67 G_M9389_IG08: ; bbWeight=1, gcrefRegs=4000 {r14}, byrefRegs=0000 {}, byref ; gcrRegs -[rbp rsi rdi r13] ; byrRegs -[rbx] mov eax, 1 ;; size=5 bbWeight=1 PerfScore 0.25 G_M9389_IG09: ; bbWeight=1, epilog, nogc, extend - add rsp, 56 + add rsp, 72 pop rbx pop rbp pop rsi @@ -177,7 +178,7 @@ G_M9389_IG13: ; bbWeight=0, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, byrefRe xor eax, eax ;; size=13 bbWeight=0 PerfScore 0.00 G_M9389_IG14: ; bbWeight=0, epilog, nogc, extend - add rsp, 56 + add rsp, 72 pop rbx pop rbp pop rsi @@ -194,7 +195,7 @@ G_M9389_IG15: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=4000 {r xor eax, eax ;; size=6 bbWeight=0 PerfScore 0.00 G_M9389_IG16: ; bbWeight=0, epilog, nogc, extend - add rsp, 56 + add rsp, 72 pop rbx pop rbp pop rsi @@ -225,7 +226,7 @@ G_M9389_IG19: ; bbWeight=0, gcrefRegs=60E0 {rbp rsi rdi r13 r14}, byrefRe jmp SHORT G_M9389_IG12 ;; size=15 bbWeight=0 PerfScore 0.00 -; Total bytes of code 367, prolog size 28, PerfScore 8149630.23, instruction count 117, allocated bytes for code 367 (MethodHash=7b38db52) for method System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1) +; Total bytes of code 374, prolog size 28, PerfScore 9192020.32, instruction count 118, allocated bytes for code 374 (MethodHash=7b38db52) for method System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1) ; ============================================================ Unwind Info: @@ -238,7 +239,7 @@ Unwind Info: FrameRegister : none (0) FrameOffset : N/A (no FrameRegister) (Value=0) UnwindCodes : - CodeOffset: 0x10 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 6 * 8 + 8 = 56 = 0x38 + CodeOffset: 0x10 UnwindOp: UWOP_ALLOC_SMALL (2) OpInfo: 8 * 8 + 8 = 72 = 0x48 CodeOffset: 0x0C UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbx (3) CodeOffset: 0x0B UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rbp (5) CodeOffset: 0x0A UnwindOp: UWOP_PUSH_NONVOL (0) OpInfo: rsi (6) ```
+9 (+2.88%) : 66433.dasm - NLog.Config.Factory`2[System.__Canon,System.__Canon]:ScanTypes(System.Type[],System.String,System.String):this (Tier1-OSR)
```diff @@ -13,16 +13,16 @@ ;* V01 arg1 [V01 ] ( 0, 0 ) ref -> zero-ref class-hnd single-def ;* V02 arg2 [V02 ] ( 0, 0 ) ref -> zero-ref class-hnd single-def ; V03 arg3 [V03,T05] ( 3, 95.75) ref -> [rbp+0xF8] class-hnd EH-live single-def tier0-frame -; V04 loc0 [V04,T02] ( 3,193.75) ref -> [rbp+0x88] do-not-enreg[H] class-hnd EH-live tier0-frame +; V04 loc0 [V04,T01] ( 3,193.75) ref -> [rbp+0x88] do-not-enreg[H] class-hnd EH-live tier0-frame ; V05 loc1 [V05,T00] ( 6,475 ) int -> [rbp+0x84] do-not-enreg[Z] EH-live tier0-frame -; V06 loc2 [V06,T03] ( 4,187.50) ref -> [rbp+0x78] do-not-enreg[Z] class-hnd EH-live tier0-frame +; V06 loc2 [V06,T02] ( 4,187.50) ref -> [rbp+0x78] do-not-enreg[Z] class-hnd EH-live tier0-frame ;* V07 loc3 [V07 ] ( 0, 0 ) ref -> zero-ref class-hnd <> ; V08 OutArgs [V08 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; V09 tmp1 [V09,T07] ( 3, 0 ) ref -> rbx class-hnd "impSpillSpecialSideEff" <> ; V10 tmp2 [V10,T08] ( 3, 0 ) ref -> rsi class-hnd exact "dup spill" <> ; V11 tmp3 [V11,T09] ( 2, 0 ) ref -> rax class-hnd exact "Strict ordering of exceptions for Array store" ; V12 PSPSym [V12,T06] ( 1, 1 ) long -> [rbp-0x20] do-not-enreg[V] "PSPSym" -; V13 cse0 [V13,T01] ( 5,293.75) int -> rbx multi-def "CSE - aggressive" +; V13 cse0 [V13,T03] ( 3,106.25) int -> [rbp-0x14] do-not-enreg[H] EH-live multi-def "CSE - aggressive" ; ; Lcl frame size = 48 @@ -39,17 +39,20 @@ G_M56624_IG01: ; bbWeight=6.25, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr mov r9, gword ptr [rbp+0xF8] ; gcrRegs +[r9] ;; size=48 bbWeight=6.25 PerfScore 67.19 -G_M56624_IG02: ; bbWeight=6.25, gcVars=0000000000000034 {V00 V03 V04}, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, gcvars, byref, isz +G_M56624_IG02: ; bbWeight=6.25, gcVars=0000000000000032 {V00 V03 V04}, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, gcvars, byref, isz ; GC ptr vars +{V00 V03 V04 V05} mov rdx, gword ptr [rbp+0x88] ; gcrRegs +[rdx] - mov ebx, dword ptr [rdx+0x08] - cmp ebx, dword ptr [rbp+0x84] - jle SHORT G_M56624_IG08 - ;; size=18 bbWeight=6.25 PerfScore 37.50 -G_M56624_IG03: ; bbWeight=93.75, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, byref, isz + mov edx, dword ptr [rdx+0x08] ; gcrRegs -[rdx] - cmp dword ptr [rbp+0x84], ebx + mov dword ptr [rbp-0x14], edx + mov edx, dword ptr [rbp-0x14] + cmp edx, dword ptr [rbp+0x84] + jle SHORT G_M56624_IG08 + ;; size=24 bbWeight=6.25 PerfScore 50.00 +G_M56624_IG03: ; bbWeight=93.75, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, byref, isz + mov edx, dword ptr [rbp-0x14] + cmp dword ptr [rbp+0x84], edx jae SHORT G_M56624_IG07 mov rdx, gword ptr [rbp+0x88] ; gcrRegs +[rdx] @@ -57,8 +60,8 @@ G_M56624_IG03: ; bbWeight=93.75, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 mov rdx, gword ptr [rdx+8*r8+0x10] mov gword ptr [rbp+0x78], rdx ; GC ptr vars +{V06} - ;; size=31 bbWeight=93.75 PerfScore 750.00 -G_M56624_IG04: ; bbWeight=93.75, gcVars=000000000000003C {V00 V03 V04 V06}, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, gcvars, byref + ;; size=34 bbWeight=93.75 PerfScore 843.75 +G_M56624_IG04: ; bbWeight=93.75, gcVars=0000000000000036 {V00 V03 V04 V06}, gcrefRegs=0202 {rcx r9}, byrefRegs=0000 {}, gcvars, byref ; gcrRegs -[rdx] ; GC ptr vars -{V05} mov rdx, gword ptr [rbp+0x78] @@ -70,19 +73,19 @@ G_M56624_IG04: ; bbWeight=93.75, gcVars=000000000000003C {V00 V03 V04 V06 ; gcr arg pop 0 nop ;; size=14 bbWeight=93.75 PerfScore 421.88 -G_M56624_IG05: ; bbWeight=93.75, gcVars=0000000000000034 {V00 V03 V04}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz - ; GC ptr vars -{V03 V06} +G_M56624_IG05: ; bbWeight=93.75, gcVars=0000000000000032 {V00 V03 V04}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz + ; GC ptr vars -{V06} mov edx, dword ptr [rbp+0x84] inc edx mov dword ptr [rbp+0x84], edx mov rdx, gword ptr [rbp+0x88] ; gcrRegs +[rdx] - mov ebx, dword ptr [rdx+0x08] - cmp ebx, dword ptr [rbp+0x84] + mov edx, dword ptr [rdx+0x08] + ; gcrRegs -[rdx] + cmp edx, dword ptr [rbp+0x84] jle SHORT G_M56624_IG08 ;; size=32 bbWeight=93.75 PerfScore 773.44 G_M56624_IG06: ; bbWeight=87.89, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz - ; gcrRegs -[rdx] mov rcx, gword ptr [rbp+0xE0] ; gcrRegs +[rcx] mov r9, gword ptr [rbp+0xF8] @@ -91,7 +94,7 @@ G_M56624_IG06: ; bbWeight=87.89, gcrefRegs=0000 {}, byrefRegs=0000 {}, by ;; size=16 bbWeight=87.89 PerfScore 351.56 G_M56624_IG07: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref ; gcrRegs -[rcx r9] - ; GC ptr vars -{V00 V04} + ; GC ptr vars -{V00 V03 V04} call CORINFO_HELP_RNGCHKFAIL ; gcr arg pop 0 ;; size=5 bbWeight=0 PerfScore 0.00 @@ -105,7 +108,7 @@ G_M56624_IG09: ; bbWeight=0, epilog, nogc, extend pop rbp ret ;; size=11 bbWeight=0 PerfScore 0.00 -G_M56624_IG10: ; bbWeight=0, gcVars=000000000000003C {V00 V03 V04 V06}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc +G_M56624_IG10: ; bbWeight=0, gcVars=0000000000000036 {V00 V03 V04 V06}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc ; gcrRegs +[rdx] ; GC ptr vars +{V00 V03 V04 V05 V06} push rbp @@ -116,7 +119,7 @@ G_M56624_IG10: ; bbWeight=0, gcVars=000000000000003C {V00 V03 V04 V06}, g mov qword ptr [rsp+0x20], rbp lea rbp, [rbp+0x40] ;; size=20 bbWeight=0 PerfScore 0.00 -G_M56624_IG11: ; bbWeight=0, gcVars=000000000000003C {V00 V03 V04 V06}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref, isz +G_M56624_IG11: ; bbWeight=0, gcVars=0000000000000036 {V00 V03 V04 V06}, gcrefRegs=0004 {rdx}, byrefRegs=0000 {}, gcvars, byref, isz mov rbx, rdx ; gcrRegs +[rbx] mov rcx, 0xD1FFAB1E ; @@ -133,7 +136,7 @@ G_M56624_IG11: ; bbWeight=0, gcVars=000000000000003C {V00 V03 V04 V06}, g mov rax, qword ptr [rax] ; gcrRegs -[rax] mov rax, qword ptr [rax+0x50] - ; GC ptr vars -{V03 V05 V06} + ; GC ptr vars -{V05 V06} call [rax+0x20] ; gcrRegs -[rcx] +[rax] ; gcr arg pop 0 @@ -174,13 +177,13 @@ G_M56624_IG12: ; bbWeight=0, funclet epilog, nogc, extend ret ;; size=8 bbWeight=0 PerfScore 0.00 G_M56624_IG13: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref - ; GC ptr vars -{V00 V04} + ; GC ptr vars -{V00 V03 V04} call CORINFO_HELP_RETHROW ; gcr arg pop 0 int3 ;; size=6 bbWeight=0 PerfScore 0.00 -; Total bytes of code 313, prolog size 48, PerfScore 2401.56, instruction count 78, allocated bytes for code 313 (MethodHash=a8d922cf) for method NLog.Config.Factory`2[System.__Canon,System.__Canon]:ScanTypes(System.Type[],System.String,System.String):this (Tier1-OSR) +; Total bytes of code 322, prolog size 48, PerfScore 2507.81, instruction count 81, allocated bytes for code 322 (MethodHash=a8d922cf) for method NLog.Config.Factory`2[System.__Canon,System.__Canon]:ScanTypes(System.Type[],System.String,System.String):this (Tier1-OSR) ; ============================================================ Unwind Info: ```
+15 (+3.86%) : 105306.dasm - OrchardCore.ResourceManagement.ResourceManager+d__27:MoveNext():ubyte:this (FullOpts)
```diff @@ -8,7 +8,7 @@ ; 2 inlinees with PGO data; 11 single block inlinees; 0 inlinees without PGO data ; Final local variable assignments ; -; V00 this [V00,T01] ( 14, 11.50) ref -> [rbp+0x10] this class-hnd EH-live single-def d__27> +; V00 this [V00,T01] ( 13, 11 ) ref -> [rbp+0x10] this class-hnd EH-live single-def d__27> ; V01 loc0 [V01,T12] ( 4, 2.50) ubyte -> rax ; V02 loc1 [V02,T13] ( 3, 2.50) int -> rax single-def ; V03 loc2 [V03,T15] ( 2, 1.50) ref -> rcx class-hnd single-def <> @@ -18,7 +18,7 @@ ;* V07 loc6 [V07 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd ; V08 OutArgs [V08 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; V09 tmp1 [V09,T10] ( 3, 3 ) ref -> rax class-hnd single-def "Inlining Arg" <> -; V10 tmp2 [V10,T11] ( 3, 3 ) struct (48) [rbp-0x50] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" +; V10 tmp2 [V10,T11] ( 3, 3 ) struct (48) [rbp-0x58] do-not-enreg[SF] must-init ld-addr-op "NewObj constructor temp" ;* V11 tmp3 [V11 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg" ;* V12 tmp4 [V12 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "Inlining Arg" ; V13 tmp5 [V13,T00] ( 5, 18.88) ref -> rdx class-hnd exact "Inlining Arg" @@ -26,48 +26,53 @@ ; V15 tmp7 [V15,T04] ( 2, 8 ) byref -> rcx "impAppendStmt" ;* V16 tmp8 [V16 ] ( 0, 0 ) ref -> zero-ref "field V06.Type (fldOffset=0x0)" P-INDEP ;* V17 tmp9 [V17 ] ( 0, 0 ) ref -> zero-ref "field V06.Name (fldOffset=0x8)" P-INDEP -; V18 tmp10 [V18,T14] ( 2, 2.50) ref -> rdi "V05.[000..008)" +; V18 tmp10 [V18,T14] ( 2, 2.50) ref -> r14 "V05.[000..008)" ; V19 tmp11 [V19,T03] ( 5, 9.45) ref -> rcx "V05.[008..016)" ;* V20 tmp12 [V20 ] ( 0, 0 ) ref -> zero-ref "V05.[016..024)" ;* V21 tmp13 [V21 ] ( 0, 0 ) ref -> zero-ref single-def "V10.[000..008)" -; V22 tmp14 [V22,T16] ( 2, 1 ) int -> rbx single-def "V10.[008..012)" +; V22 tmp14 [V22,T16] ( 2, 1 ) int -> r14 single-def "V10.[008..012)" ;* V23 tmp15 [V23 ] ( 0, 0 ) int -> zero-ref single-def "V10.[012..016)" ;* V24 tmp16 [V24 ] ( 0, 0 ) int -> zero-ref single-def "V10.[016..020)" -; V25 tmp17 [V25,T07] ( 5, 5 ) byref -> r14 single-def "Spilling address for field-by-field copy" +; V25 tmp17 [V25,T07] ( 5, 5 ) byref -> r15 single-def "Spilling address for field-by-field copy" ; V26 tmp18 [V26,T02] ( 3, 12 ) byref -> rcx "Spilling address for field-by-field copy" ; V27 PSPSym [V27,T17] ( 1, 1 ) long -> [rbp-0x60] do-not-enreg[V] "PSPSym" -; V28 cse0 [V28,T05] ( 5, 6 ) byref -> r14 hoist multi-def "CSE - aggressive" -; V29 cse1 [V29,T09] ( 2, 4.50) long -> rbx hoist "CSE - aggressive" +; V28 cse0 [V28,T05] ( 4, 5.50) byref -> rbx must-init multi-def "CSE - aggressive" +; V29 cse1 [V29,T09] ( 2, 4.50) long -> rdi hoist "CSE - aggressive" ; V30 cse2 [V30,T06] ( 3, 5.44) int -> r8 "CSE - moderate" ; -; Lcl frame size = 96 +; Lcl frame size = 88 G_M2712_IG01: ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG push rbp + push r15 push r14 push rdi push rsi push rbx - sub rsp, 96 + sub rsp, 88 vzeroupper lea rbp, [rsp+0x80] + xor ebx, ebx + mov qword ptr [rbp-0x58], rbx vxorps xmm4, xmm4, xmm4 vmovdqu ymmword ptr [rbp-0x50], ymm4 - vmovdqa xmmword ptr [rbp-0x30], xmm4 + mov qword ptr [rbp-0x30], rbx mov qword ptr [rbp-0x60], rsp mov gword ptr [rbp+0x10], rcx ; GC ptr vars +{V00} mov rsi, rcx ; gcrRegs +[rsi] - ;; size=46 bbWeight=1 PerfScore 13.33 -G_M2712_IG02: ; bbWeight=1, gcVars=0000000000000002 {V00}, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, gcvars, byref, isz + xor ebx, ebx + ;; size=55 bbWeight=1 PerfScore 14.83 +G_M2712_IG02: ; bbWeight=1, gcVars=0000000000000002 {V00}, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, gcvars, byref, isz + ; byrRegs +[rbx] mov eax, dword ptr [rsi+0x28] mov rcx, gword ptr [rsi+0x10] ; gcrRegs +[rcx] test eax, eax je SHORT G_M2712_IG04 ;; size=11 bbWeight=1 PerfScore 5.25 -G_M2712_IG03: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref +G_M2712_IG03: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, byref ; gcrRegs -[rcx] cmp eax, 1 je G_M2712_IG14 @@ -76,19 +81,22 @@ G_M2712_IG03: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, b ;; size=16 bbWeight=0.50 PerfScore 1.75 G_M2712_IG04: ; bbWeight=0.50, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 {}, byref ; gcrRegs +[rcx] + ; byrRegs -[rbx] mov dword ptr [rsi+0x28], -1 mov rax, gword ptr [rcx+0x08] ; gcrRegs +[rax] - mov ebx, dword ptr [rax+0x44] + mov r14d, dword ptr [rax+0x44] vxorps xmm0, xmm0, xmm0 + vmovdqu xmmword ptr [rbp-0x40], xmm0 vmovdqu xmmword ptr [rbp-0x38], xmm0 - vmovdqu xmmword ptr [rbp-0x30], xmm0 - mov gword ptr [rbp-0x50], rax - lea r14, bword ptr [rsi+0x30] - ; byrRegs +[r14] - mov rdi, r14 + mov gword ptr [rbp-0x58], rax + lea rbx, bword ptr [rsi+0x30] + ; byrRegs +[rbx] + mov r15, rbx + ; byrRegs +[r15] + mov rdi, r15 ; byrRegs +[rdi] - lea rsi, bword ptr [rbp-0x50] + lea rsi, bword ptr [rbp-0x58] ; gcrRegs -[rsi] ; byrRegs +[rsi] call CORINFO_HELP_ASSIGN_BYREF @@ -98,21 +106,21 @@ G_M2712_IG04: ; bbWeight=0.50, gcrefRegs=0042 {rcx rsi}, byrefRegs=0000 { call CORINFO_HELP_ASSIGN_BYREF call CORINFO_HELP_ASSIGN_BYREF call CORINFO_HELP_ASSIGN_BYREF - mov dword ptr [r14+0x08], ebx + mov dword ptr [r15+0x08], r14d xor ecx, ecx - mov dword ptr [r14+0x0C], ecx - mov dword ptr [r14+0x10], 2 + mov dword ptr [r15+0x0C], ecx + mov dword ptr [r15+0x10], 2 mov rsi, gword ptr [rbp+0x10] ; gcrRegs +[rsi] ; byrRegs -[rsi] jmp G_M2712_IG14 - ;; size=94 bbWeight=0.50 PerfScore 10.92 -G_M2712_IG05: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=4000 {r14}, byref, isz - ; byrRegs -[rdi] + ;; size=98 bbWeight=0.50 PerfScore 11.04 +G_M2712_IG05: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, byref, isz + ; byrRegs -[rdi r15] lea rcx, bword ptr [rsi+0x48] ; byrRegs +[rcx] - mov rdi, gword ptr [rcx] - ; gcrRegs +[rdi] + mov r14, gword ptr [rcx] + ; gcrRegs +[r14] mov rcx, gword ptr [rcx+0x08] ; gcrRegs +[rcx] ; byrRegs -[rcx] @@ -121,20 +129,20 @@ G_M2712_IG05: ; bbWeight=2, gcrefRegs=0040 {rsi}, byrefRegs=4000 {r14}, b cmp rcx, rdx je SHORT G_M2712_IG13 ;; size=20 bbWeight=2 PerfScore 15.50 -G_M2712_IG06: ; bbWeight=1.73, gcrefRegs=00C6 {rcx rdx rsi rdi}, byrefRegs=4000 {r14}, byref, isz +G_M2712_IG06: ; bbWeight=1.73, gcrefRegs=4046 {rcx rdx rsi r14}, byrefRegs=0008 {rbx}, byref, isz test rcx, rcx je SHORT G_M2712_IG10 ;; size=5 bbWeight=1.73 PerfScore 2.16 -G_M2712_IG07: ; bbWeight=1.72, gcrefRegs=00C6 {rcx rdx rsi rdi}, byrefRegs=4000 {r14}, byref, isz +G_M2712_IG07: ; bbWeight=1.72, gcrefRegs=4046 {rcx rdx rsi r14}, byrefRegs=0008 {rbx}, byref, isz test rdx, rdx je SHORT G_M2712_IG10 ;; size=5 bbWeight=1.72 PerfScore 2.15 -G_M2712_IG08: ; bbWeight=1.72, gcrefRegs=00C6 {rcx rdx rsi rdi}, byrefRegs=4000 {r14}, byref, isz +G_M2712_IG08: ; bbWeight=1.72, gcrefRegs=4046 {rcx rdx rsi r14}, byrefRegs=0008 {rbx}, byref, isz mov r8d, dword ptr [rcx+0x08] cmp r8d, dword ptr [rdx+0x08] jne SHORT G_M2712_IG10 ;; size=10 bbWeight=1.72 PerfScore 10.31 -G_M2712_IG09: ; bbWeight=2, gcrefRegs=00C6 {rcx rdx rsi rdi}, byrefRegs=4000 {r14}, byref, isz +G_M2712_IG09: ; bbWeight=2, gcrefRegs=4046 {rcx rdx rsi r14}, byrefRegs=0008 {rbx}, byref, isz add rcx, 12 ; gcrRegs -[rcx] ; byrRegs +[rcx] @@ -148,75 +156,76 @@ G_M2712_IG09: ; bbWeight=2, gcrefRegs=00C6 {rcx rdx rsi rdi}, byrefRegs=4 test eax, eax jne SHORT G_M2712_IG13 ;; size=21 bbWeight=2 PerfScore 10.00 -G_M2712_IG10: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=4000 {r14}, byref, isz - ; gcrRegs -[rdi] - mov rcx, r14 +G_M2712_IG10: ; bbWeight=4, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, byref, isz + ; gcrRegs -[r14] + mov rcx, rbx ; byrRegs +[rcx] - mov rdx, rbx + mov rdx, rdi call [] ; byrRegs -[rcx] ; gcr arg pop 0 test eax, eax jne SHORT G_M2712_IG05 ;; size=16 bbWeight=4 PerfScore 19.00 -G_M2712_IG11: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=4000 {r14}, byref +G_M2712_IG11: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, byref mov dword ptr [rsi+0x28], -1 xor eax, eax - mov qword ptr [r14], rax - mov qword ptr [r14+0x08], rax - mov qword ptr [r14+0x10], rax - mov qword ptr [r14+0x18], rax - mov qword ptr [r14+0x20], rax - mov qword ptr [r14+0x28], rax + mov qword ptr [rbx], rax + mov qword ptr [rbx+0x08], rax + mov qword ptr [rbx+0x10], rax + mov qword ptr [rbx+0x18], rax + mov qword ptr [rbx+0x20], rax + mov qword ptr [rbx+0x28], rax ;; size=32 bbWeight=0.50 PerfScore 3.62 G_M2712_IG12: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz - ; byrRegs -[r14] + ; byrRegs -[rbx] jmp SHORT G_M2712_IG15 ;; size=2 bbWeight=0.50 PerfScore 1.00 -G_M2712_IG13: ; bbWeight=0.50, gcrefRegs=00C0 {rsi rdi}, byrefRegs=0000 {}, byref, isz - ; gcrRegs +[rdi] +G_M2712_IG13: ; bbWeight=0.50, gcrefRegs=4040 {rsi r14}, byrefRegs=0000 {}, byref, isz + ; gcrRegs +[r14] lea rcx, bword ptr [rsi+0x08] ; byrRegs +[rcx] - mov rdx, rdi + mov rdx, r14 ; gcrRegs +[rdx] call CORINFO_HELP_ASSIGN_REF - ; gcrRegs -[rdx rdi] + ; gcrRegs -[rdx r14] ; byrRegs -[rcx] mov dword ptr [rsi+0x28], 1 mov eax, 1 jmp SHORT G_M2712_IG15 ;; size=26 bbWeight=0.50 PerfScore 2.50 -G_M2712_IG14: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0000 {}, byref, isz +G_M2712_IG14: ; bbWeight=0.50, gcrefRegs=0040 {rsi}, byrefRegs=0008 {rbx}, byref, isz + ; byrRegs +[rbx] mov dword ptr [rsi+0x28], -3 - lea r14, bword ptr [rsi+0x30] - ; byrRegs +[r14] - mov rbx, 0xD1FFAB1E ; System.Collections.Generic.Dictionary`2+Enumerator[OrchardCore.ResourceManagement.ResourceManager+ResourceTypeName,OrchardCore.ResourceManagement.RequireSettings] + mov rdi, 0xD1FFAB1E ; System.Collections.Generic.Dictionary`2+Enumerator[OrchardCore.ResourceManagement.ResourceManager+ResourceTypeName,OrchardCore.ResourceManagement.RequireSettings] jmp SHORT G_M2712_IG10 - ;; size=23 bbWeight=0.50 PerfScore 1.88 + ;; size=19 bbWeight=0.50 PerfScore 1.62 G_M2712_IG15: ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, epilog, nogc ; gcrRegs -[rsi] - ; byrRegs -[r14] + ; byrRegs -[rbx] ; GC ptr vars -{V00} - add rsp, 96 + add rsp, 88 pop rbx pop rsi pop rdi pop r14 + pop r15 pop rbp ... ```
Details
#### Improvements/regressions per collection |Collection|Contexts with diffs|Improvements|Regressions|Same size|Improvements (bytes)|Regressions (bytes)| |---|--:|--:|--:|--:|--:|--:| |aspnet.run.windows.x64.checked.mch|1,256|816|138|302|-3,788|+1,090| --- #### Context information |Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff| |---|--:|--:|--:|--:|--:| |aspnet.run.windows.x64.checked.mch|113,707|48,175|65,532|0 (0.00%)|1 (0.00%)| --- #### jit-analyze output
aspnet.run.windows.x64.checked.mch
To reproduce these diffs on Windows x64: ``` superpmi.py asmdiffs -target_os windows -target_arch x64 -arch x64 ``` ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 43323348 (overridden on cmd) Total bytes of diff: 43320650 (overridden on cmd) Total bytes of delta: -2698 (-0.01 % of base) diff is a regression. relative diff is an improvement. ```
Detail diffs ``` Top file regressions (bytes): 321 : 105853.dasm (8.89% of base) 15 : 105306.dasm (3.86% of base) 15 : 89500.dasm (0.64% of base) 15 : 92798.dasm (0.64% of base) 14 : 65403.dasm (0.63% of base) 14 : 64317.dasm (0.63% of base) 14 : 62589.dasm (0.63% of base) 13 : 57510.dasm (0.59% of base) 13 : 41511.dasm (0.58% of base) 13 : 19722.dasm (0.58% of base) 13 : 53025.dasm (0.59% of base) 13 : 23629.dasm (0.58% of base) 13 : 107202.dasm (0.75% of base) 13 : 59861.dasm (0.59% of base) 13 : 58879.dasm (0.59% of base) 13 : 42892.dasm (0.58% of base) 13 : 48125.dasm (0.59% of base) 12 : 108350.dasm (0.87% of base) 12 : 92740.dasm (0.74% of base) 12 : 107582.dasm (0.74% of base) Top file improvements (bytes): -44 : 60693.dasm (-4.35% of base) -44 : 29595.dasm (-4.35% of base) -31 : 44803.dasm (-0.67% of base) -31 : 19522.dasm (-2.74% of base) -25 : 74589.dasm (-0.28% of base) -25 : 90698.dasm (-0.28% of base) -23 : 65369.dasm (-5.67% of base) -23 : 90719.dasm (-1.55% of base) -23 : 76126.dasm (-1.55% of base) -18 : 108572.dasm (-1.89% of base) -18 : 63942.dasm (-2.86% of base) -17 : 68045.dasm (-2.62% of base) -17 : 90533.dasm (-2.62% of base) -16 : 40658.dasm (-0.22% of base) -16 : 92425.dasm (-0.77% of base) -16 : 46917.dasm (-0.22% of base) -16 : 89106.dasm (-0.76% of base) -14 : 26306.dasm (-0.65% of base) -14 : 9233.dasm (-2.83% of base) -14 : 9779.dasm (-2.83% of base) 86 total files with Code Size differences (47 improved, 39 regressed), 20 unchanged. Top method regressions (bytes): 321 ( 8.89% of base) : 105853.dasm - Microsoft.Extensions.DependencyInjection.Extensions.ServiceCollectionDescriptorExtensions:TryAddEnumerable(Microsoft.Extensions.DependencyInjection.IServiceCollection,Microsoft.Extensions.DependencyInjection.ServiceDescriptor) (Tier1-OSR) 15 ( 0.64% of base) : 89500.dasm - Microsoft.AspNetCore.Authorization.DefaultAuthorizationService+d__7:MoveNext():this (Tier1) 15 ( 0.64% of base) : 92798.dasm - Microsoft.AspNetCore.Authorization.DefaultAuthorizationService+d__7:MoveNext():this (Tier1) 15 ( 3.86% of base) : 105306.dasm - OrchardCore.ResourceManagement.ResourceManager+d__27:MoveNext():ubyte:this (FullOpts) 14 ( 0.63% of base) : 65403.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 14 ( 0.63% of base) : 64317.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 14 ( 0.63% of base) : 62589.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.59% of base) : 57510.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.58% of base) : 41511.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.58% of base) : 19722.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.59% of base) : 53025.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.58% of base) : 23629.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.59% of base) : 59861.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.59% of base) : 58879.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.58% of base) : 42892.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.59% of base) : 48125.dasm - Npgsql.Internal.NpgsqlConnector:ParseServerMessage(Npgsql.Internal.NpgsqlReadBuffer,ubyte,int,ubyte,ubyte):Npgsql.IBackendMessage:this (Tier1) 13 ( 0.75% of base) : 107202.dasm - OrchardCore.ResourceManagement.ResourceManager:FindMatchingResource(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.String,System.Collections.Generic.IList`1[OrchardCore.ResourceManagement.ResourceDefinition]]],OrchardCore.ResourceManagement.RequireSettings,System.String):OrchardCore.ResourceManagement.ResourceDefinition (Tier1) 12 ( 0.87% of base) : 108350.dasm - Microsoft.Extensions.Configuration.ConfigurationProvider:GetChildKeys(System.Collections.Generic.IEnumerable`1[System.String],System.String):System.Collections.Generic.IEnumerable`1[System.String]:this (Instrumented Tier1) 12 ( 0.74% of base) : 92740.dasm - Newtonsoft.Json.JsonReader:Skip():this (Tier1) 12 ( 0.74% of base) : 107582.dasm - Newtonsoft.Json.JsonReader:Skip():this (Tier1) Top method improvements (bytes): -44 (-4.35% of base) : 60693.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[System.__Canon,System.__Canon]:InitializeFromCollection(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.__Canon,System.__Canon]]):this (Tier1-OSR) -44 (-4.35% of base) : 29595.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[System.__Canon,System.__Canon]:InitializeFromCollection(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.__Canon,System.__Canon]]):this (Tier1-OSR) -31 (-2.74% of base) : 19522.dasm - Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection:TakeStartLine(byref):ubyte:this (Tier1) -31 (-0.67% of base) : 44803.dasm - Microsoft.EntityFrameworkCore.Metadata.Conventions.RelationshipDiscoveryConvention:FindRelationshipCandidates(Microsoft.EntityFrameworkCore.Metadata.Builders.IConventionEntityTypeBuilder,System.Collections.Generic.HashSet`1[System.Type]):System.Collections.Generic.IReadOnlyList`1[Microsoft.EntityFrameworkCore.Metadata.Conventions.RelationshipDiscoveryConvention+RelationshipCandidate]:this (FullOpts) -25 (-0.28% of base) : 74589.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.ExpressionBinder:bindUserDefinedConversion(Microsoft.CSharp.RuntimeBinder.Semantics.Expr,Microsoft.CSharp.RuntimeBinder.Semantics.CType,Microsoft.CSharp.RuntimeBinder.Semantics.CType,ubyte,byref,ubyte):ubyte:this (Tier1-OSR) -25 (-0.28% of base) : 90698.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.ExpressionBinder:bindUserDefinedConversion(Microsoft.CSharp.RuntimeBinder.Semantics.Expr,Microsoft.CSharp.RuntimeBinder.Semantics.CType,Microsoft.CSharp.RuntimeBinder.Semantics.CType,ubyte,byref,ubyte):ubyte:this (Tier1-OSR) -23 (-1.55% of base) : 90719.dasm - Markdig.Helpers.CharacterMap`1[System.__Canon]:.ctor(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[ushort,System.__Canon]]):this (Tier1-OSR) -23 (-1.55% of base) : 76126.dasm - Markdig.Helpers.CharacterMap`1[System.__Canon]:.ctor(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[ushort,System.__Canon]]):this (Tier1-OSR) -23 (-5.67% of base) : 65369.dasm - System.Threading.SpinWait:SpinUntil(System.Func`1[ubyte],int):ubyte (Tier1-OSR) -18 (-1.89% of base) : 108572.dasm - System.IO.PathInternal:NormalizeDirectorySeparators(System.String):System.String (Instrumented Tier1) -18 (-2.86% of base) : 63942.dasm - System.Threading.LowLevelLifoSemaphore:Wait(int,ubyte):ubyte:this (Tier1-OSR) -17 (-2.62% of base) : 68045.dasm - System.Linq.Enumerable+SelectEnumerableIterator`2[System.__Canon,System.__Canon]:ToArray():System.__Canon[]:this (Tier1-OSR) -17 (-2.62% of base) : 90533.dasm - System.Linq.Enumerable+SelectEnumerableIterator`2[System.__Canon,System.__Canon]:ToArray():System.__Canon[]:this (Tier1-OSR) -16 (-0.22% of base) : 40658.dasm - Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+d__238`1[System.__Canon]:MoveNext():this (Tier1-OSR) -16 (-0.22% of base) : 46917.dasm - Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+d__238`1[System.__Canon]:MoveNext():this (Tier1-OSR) -16 (-0.77% of base) : 92425.dasm - OrchardCore.ResourceManagement.ResourceManager:FindMatchingResource(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.String,System.Collections.Generic.IList`1[OrchardCore.ResourceManagement.ResourceDefinition]]],OrchardCore.ResourceManagement.RequireSettings,System.String):OrchardCore.ResourceManagement.ResourceDefinition (Tier1) -16 (-0.76% of base) : 89106.dasm - OrchardCore.ResourceManagement.ResourceManager:FindMatchingResource(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.String,System.Collections.Generic.IList`1[OrchardCore.ResourceManagement.ResourceDefinition]]],OrchardCore.ResourceManagement.RequireSettings,System.String):OrchardCore.ResourceManagement.ResourceDefinition (Tier1) -14 (-0.23% of base) : 62257.dasm - Benchmarks.Data.RawDb+d__9:MoveNext():this (Tier1-OSR) -14 (-2.83% of base) : 9233.dasm - Microsoft.AspNetCore.Hosting.HostingAbstractionsWebHostBuilderExtensions:UseConfiguration(Microsoft.AspNetCore.Hosting.IWebHostBuilder,Microsoft.Extensions.Configuration.IConfiguration):Microsoft.AspNetCore.Hosting.IWebHostBuilder (Tier1-OSR) -14 (-2.83% of base) : 9779.dasm - Microsoft.AspNetCore.Hosting.HostingAbstractionsWebHostBuilderExtensions:UseConfiguration(Microsoft.AspNetCore.Hosting.IWebHostBuilder,Microsoft.Extensions.Configuration.IConfiguration):Microsoft.AspNetCore.Hosting.IWebHostBuilder (Tier1-OSR) Top method regressions (percentages): 321 ( 8.89% of base) : 105853.dasm - Microsoft.Extensions.DependencyInjection.Extensions.ServiceCollectionDescriptorExtensions:TryAddEnumerable(Microsoft.Extensions.DependencyInjection.IServiceCollection,Microsoft.Extensions.DependencyInjection.ServiceDescriptor) (Tier1-OSR) 15 ( 3.86% of base) : 105306.dasm - OrchardCore.ResourceManagement.ResourceManager+d__27:MoveNext():ubyte:this (FullOpts) 9 ( 2.88% of base) : 66433.dasm - NLog.Config.Factory`2[System.__Canon,System.__Canon]:ScanTypes(System.Type[],System.String,System.String):this (Tier1-OSR) 7 ( 1.91% of base) : 58169.dasm - System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1) 7 ( 1.91% of base) : 63125.dasm - System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1) 7 ( 1.91% of base) : 60299.dasm - System.Text.Json.Serialization.Converters.ArrayConverter`2[System.__Canon,System.__Canon]:OnWriteResume(System.Text.Json.Utf8JsonWriter,System.__Canon[],System.Text.Json.JsonSerializerOptions,byref):ubyte:this (Tier1) 12 ( 1.86% of base) : 24638.dasm - System.Collections.Generic.ArraySortHelper`1[System.__Canon]:PickPivotAndPartition(System.Span`1[System.__Canon],System.Comparison`1[System.__Canon]):int (Tier1-OSR) 7 ( 1.55% of base) : 112278.dasm - System.Collections.Generic.HashSet`1[System.__Canon]:UnionWith(System.Collections.Generic.IEnumerable`1[System.__Canon]):this (Instrumented Tier1) 7 ( 1.55% of base) : 108666.dasm - System.Collections.Generic.HashSet`1[System.__Canon]:UnionWith(System.Collections.Generic.IEnumerable`1[System.__Canon]):this (Instrumented Tier1) 7 ( 1.55% of base) : 110701.dasm - System.Collections.Generic.HashSet`1[System.__Canon]:UnionWith(System.Collections.Generic.IEnumerable`1[System.__Canon]):this (Instrumented Tier1) 7 ( 1.39% of base) : 88543.dasm - Castle.DynamicProxy.ProxyGenerationOptions:HasEquivalentAdditionalAttributes(Castle.DynamicProxy.ProxyGenerationOptions):ubyte:this (Tier1) 7 ( 1.39% of base) : 92331.dasm - Castle.DynamicProxy.ProxyGenerationOptions:HasEquivalentAdditionalAttributes(Castle.DynamicProxy.ProxyGenerationOptions):ubyte:this (Tier1) 7 ( 1.39% of base) : 107074.dasm - Castle.DynamicProxy.ProxyGenerationOptions:HasEquivalentAdditionalAttributes(Castle.DynamicProxy.ProxyGenerationOptions):ubyte:this (Tier1) 7 ( 1.34% of base) : 90578.dasm - System.ComponentModel.AttributeCollection:get_Item(System.Type):System.Attribute:this (Tier1-OSR) 7 ( 1.34% of base) : 69438.dasm - System.ComponentModel.AttributeCollection:get_Item(System.Type):System.Attribute:this (Tier1-OSR) 5 ( 1.20% of base) : 101663.dasm - Esprima.Scanner:GetIdentifier(ubyte):System.String:this (FullOpts) 6 ( 1.07% of base) : 49440.dasm - Microsoft.AspNetCore.Mvc.Razor.DefaultTagHelperFactory:InitializeTagHelper[System.__Canon](System.__Canon,Microsoft.AspNetCore.Mvc.Rendering.ViewContext) (Instrumented Tier1) 7 ( 1.01% of base) : 109612.dasm - Microsoft.AspNetCore.Identity.RoleManager`1[System.__Canon]:.ctor(Microsoft.AspNetCore.Identity.IRoleStore`1[System.__Canon],System.Collections.Generic.IEnumerable`1[System.__Canon],Microsoft.AspNetCore.Identity.ILookupNormalizer,Microsoft.AspNetCore.Identity.IdentityErrorDescriber,Microsoft.Extensions.Logging.ILogger`1[System.__Canon]):this (Instrumented Tier1) 6 ( 0.95% of base) : 108363.dasm - Microsoft.Extensions.Configuration.ChainedConfigurationProvider:GetChildKeys(System.Collections.Generic.IEnumerable`1[System.String],System.String):System.Collections.Generic.IEnumerable`1[System.String]:this (Instrumented Tier1) 11 ( 0.90% of base) : 112284.dasm - System.Resources.ResourceManager:GetString(System.String,System.Globalization.CultureInfo):System.String:this (Instrumented Tier1) Top method improvements (percentages): -13 (-6.60% of base) : 93057.dasm - System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (FullOpts) -11 (-6.01% of base) : 87257.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -11 (-6.01% of base) : 91217.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -11 (-6.01% of base) : 106222.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -23 (-5.67% of base) : 65369.dasm - System.Threading.SpinWait:SpinUntil(System.Func`1[ubyte],int):ubyte (Tier1-OSR) -13 (-5.33% of base) : 68976.dasm - Microsoft.Extensions.DependencyInjection.ServiceCollectionExtensions+<>c:b__10_0(Microsoft.Extensions.DependencyInjection.IServiceCollection):this (Tier1-OSR) -13 (-5.33% of base) : 90561.dasm - Microsoft.Extensions.DependencyInjection.ServiceCollectionExtensions+<>c:b__10_0(Microsoft.Extensions.DependencyInjection.IServiceCollection):this (Tier1-OSR) -44 (-4.35% of base) : 60693.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[System.__Canon,System.__Canon]:InitializeFromCollection(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.__Canon,System.__Canon]]):this (Tier1-OSR) -44 (-4.35% of base) : 29595.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[System.__Canon,System.__Canon]:InitializeFromCollection(System.Collections.Generic.IEnumerable`1[System.Collections.Generic.KeyValuePair`2[System.__Canon,System.__Canon]]):this (Tier1-OSR) -7 (-3.87% of base) : 112471.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -7 (-3.87% of base) : 110015.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -7 (-3.87% of base) : 110931.dasm - System.SpanHelpers:LastIndexOfValueType[short,System.SpanHelpers+DontNegate`1[short]](byref,short,int):int (Tier1) -13 (-3.39% of base) : 98836.dasm - Newtonsoft.Json.Linq.JContainer:TryAddInternal(int,System.Object,ubyte,ubyte):ubyte:this (FullOpts) -6 (-3.23% of base) : 97395.dasm - System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (Tier1) -6 (-3.23% of base) : 95525.dasm - System.IPv6AddressHelper:ShouldHaveIpv4Embedded(System.ReadOnlySpan`1[ushort]):ubyte (Tier1) -12 (-3.18% of base) : 105897.dasm - System.Text.DBCSCodePageEncoding:LoadManagedCodePage():this (Tier1-OSR) -5 (-3.11% of base) : 98921.dasm - Newtonsoft.Json.Linq.JTokenReader:Read():ubyte:this (FullOpts) -18 (-2.86% of base) : 63942.dasm - System.Threading.LowLevelLifoSemaphore:Wait(int,ubyte):ubyte:this (Tier1-OSR) -14 (-2.83% of base) : 9233.dasm - Microsoft.AspNetCore.Hosting.HostingAbstractionsWebHostBuilderExtensions:UseConfiguration(Microsoft.AspNetCore.Hosting.IWebHostBuilder,Microsoft.Extensions.Configuration.IConfiguration):Microsoft.AspNetCore.Hosting.IWebHostBuilder (Tier1-OSR) -14 (-2.83% of base) : 9779.dasm - Microsoft.AspNetCore.Hosting.HostingAbstractionsWebHostBuilderExtensions:UseConfiguration(Microsoft.AspNetCore.Hosting.IWebHostBuilder,Microsoft.Extensions.Configuration.IConfiguration):Microsoft.AspNetCore.Hosting.IWebHostBuilder (Tier1-OSR) ```
--------------------------------------------------------------------------------

But splitting seems too messy. It is not clear how to properly handle the exception set modelling and similar aspects. It seems preferable to just build the right candidate sets up front (in which case we would not need the useless def pruning).

AndyAyersMS commented 7 months ago

If we decide to revise CSE finding, we'll also have to come to grips with:

For live-across call, there's the option of splitting a multi-use CSE into parts, too. For exception set handling, splitting instead of losing the CSE may also be a good idea.

For BB_NO_CSE_IN, we could perhaps run before and after RBO, and only do the costly(?) checking in the after. RBO should not in general change the candidate set (?) so the basic locate building (building up the equivalent VN sets) built by the before phase could in principle carry over to the later phase.

For that matter, VN itself could be building these sets, or we could have a mode where they are automagically maintained whenever a tree's VN is set to something. That might be too fragile, though.... there is no general way to verify a tree has the proper VN.

AndyAyersMS commented 7 months ago

Played around with this a bit more... generally not having multi-def seems to be an improvement, but not always:

One example is in TryEnterReaderLockCore, here the CSE is a TLS access helper call, and initially the code splits into two paths both of which access TLS, and below the join are more TLS accesses. Current compiler treats all this as one big CSE with 2 defs and 13 uses, new analysis as 5 separate single-def 2 use CSEs. Profitability accepts the one big one but not the 5 smaller ones (4 of which have no def or use weight):

;; current jit (one big candidate)

CSE #02, {$380, $2  } defCnt=2 useCnt=13: [def=100.000000, use=0.037279, cost= 15, call]
        :: N002 ( 15,  7) CSE #02 (def)[000331] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380

Considering CSE #02 {$380, $2  } [def=100.000000, use=0.037279, cost= 15, call]
CSE Expression : 
N002 ( 15,  7) CSE #02 (def)[000331] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
N001 (  1,  1)              [000330] ----------- arg0 in rcx             \--*  CNS_INT   int    3 $44

Aggressive CSE Promotion (200.037279 >= 200.000000)
cseRefCnt=200.037279, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=100.000000, useCnt=0.037279, cost=15, size=7, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=156, extra_yes_cost=0
CSE cost savings check (156.559180 >= 100.037279) passes

compared to

;; hypothetical jit (5 small candidates)

CSE #07, {$380, $2  } defCnt=1 useCnt=2: [def=100.000000, use=0.037279, cost= 15, call]
        :: N002 ( 15,  7) CSE #07 (def)[000331] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
CSE #02, {$380, $2  } defCnt=1 useCnt=2: [def=0.000000, use=0.000000, cost= 15, call]
        :: N002 ( 15,  7) CSE #02 (def)[000484] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
CSE #13, {$380, $2  } defCnt=1 useCnt=2: [def=0.000000, use=0.000000, cost= 15, call]
        :: N002 ( 15,  7) CSE #13 (def)[000644] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
CSE #18, {$380, $2  } defCnt=1 useCnt=2: [def=0.000000, use=0.000000, cost= 15, call]
        :: N002 ( 15,  7) CSE #18 (def)[000912] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
CSE #22, {$380, $2  } defCnt=1 useCnt=2: [def=0.000000, use=0.000000, cost= 15, call]
        :: N002 ( 15,  7) CSE #22 (def)[000778] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380

Considering CSE #07 {$380, $2  } [def=100.000000, use=0.037279, cost= 15, call]
CSE Expression : 
N002 ( 15,  7) CSE #07 (def)[000331] H-C-G------                         *  CALL help byref  CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED $380
N001 (  1,  1)              [000330] ----------- arg0 in rcx             \--*  CNS_INT   int    3 $44

Aggressive CSE Promotion (200.037279 >= 200.000000)
cseRefCnt=200.037279, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=100.000000, useCnt=0.037279, cost=15, size=7, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=24, extra_yes_cost=0
CSE cost savings check (24.559180 >= 100.037279) fails

(note the one big case makes it solely because of extra_no_cost, which is a code size bumper).

Another thing we should consider is to boost the CSE cost for trees with calls, right now they are considered fairly cheap, but if they're pure/hoistable/cse-able then perhaps we could give them costs that better reflect their true cycle impact.