MihuBot / runtime-utils

0 stars 0 forks source link

[X64] MihaZupan/runtime/cheaper-vector-narrow #393

Open MihuBot opened 3 months ago

MihuBot commented 3 months ago

Job completed in 19 minutes.

Diffs

Found 259 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 39663927
Total bytes of diff: 39663933
Total bytes of delta: 6 (0.00 % of base)
Total relative delta: -0.27
    diff is a regression.
    relative diff is an improvement.

Total byte diff includes 97 bytes from reconciling methods
    Base had    0 unique methods,        0 unique bytes
    Diff had    2 unique methods,       97 unique bytes

Top file regressions (bytes):
           6 : System.Private.CoreLib.dasm (0.00 % of base)

1 total files with Code Size differences (0 improved, 1 regressed), 255 unchanged.

Top method regressions (bytes):
          68 (Infinity of base) : System.Private.CoreLib.dasm - System.Text.Ascii:ExtractAsciiVector(System.Runtime.Intrinsics.Vector512`1[ushort],System.Runtime.Intrinsics.Vector512`1[ushort]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts) (0 base, 1 diff methods)
          29 (Infinity of base) : System.Private.CoreLib.dasm - System.Text.Ascii:ExtractAsciiVector(System.Runtime.Intrinsics.Vector256`1[ushort],System.Runtime.Intrinsics.Vector256`1[ushort]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts) (0 base, 1 diff methods)

Top method improvements (bytes):
         -36 (-16.00 % of base) : System.Private.CoreLib.dasm - System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong (FullOpts)
         -36 (-5.99 % of base) : System.Private.CoreLib.dasm - System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong (FullOpts)
         -19 (-5.40 % of base) : System.Private.CoreLib.dasm - System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte (FullOpts)

Top method regressions (percentages):
          29 (Infinity of base) : System.Private.CoreLib.dasm - System.Text.Ascii:ExtractAsciiVector(System.Runtime.Intrinsics.Vector256`1[ushort],System.Runtime.Intrinsics.Vector256`1[ushort]):System.Runtime.Intrinsics.Vector256`1[ubyte] (FullOpts) (0 base, 1 diff methods)
          68 (Infinity of base) : System.Private.CoreLib.dasm - System.Text.Ascii:ExtractAsciiVector(System.Runtime.Intrinsics.Vector512`1[ushort],System.Runtime.Intrinsics.Vector512`1[ushort]):System.Runtime.Intrinsics.Vector512`1[ubyte] (FullOpts) (0 base, 1 diff methods)

Top method improvements (percentages):
         -36 (-16.00 % of base) : System.Private.CoreLib.dasm - System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong (FullOpts)
         -36 (-5.99 % of base) : System.Private.CoreLib.dasm - System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong (FullOpts)
         -19 (-5.40 % of base) : System.Private.CoreLib.dasm - System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte (FullOpts)

5 total methods with Code Size differences (3 improved, 2 regressed), 244928 unchanged.

--------------------------------------------------------------------------------

Artifacts:

MihuBot commented 3 months ago

Top method improvements

-36 (-16.00 % of base) - System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong ```diff ; Assembly listing for method System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Unix ; FullOpts code ; optimized code ; rbp based frame ; fully interruptible ; No PGO data -; 0 inlinees with PGO data; 0 single block inlinees; 4 inlinees without PGO data +; 0 inlinees with PGO data; 4 single block inlinees; 8 inlinees without PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T04] ( 3, 3 ) long -> rdi single-def ; V01 arg1 [V01,T03] ( 5, 3.50) long -> rsi single-def ; V02 arg2 [V02,T05] ( 3, 2.50) long -> rdx single-def ; V03 loc0 [V03,T01] ( 5, 10.50) byref -> rdi single-def -; V04 loc1 [V04,T07] ( 12, 17.50) simd32 -> mm0 +; V04 loc1 [V04,T07] ( 14, 18.50) simd32 -> mm0 ; V05 loc2 [V05,T02] ( 5, 6 ) byref -> rax single-def ; V06 loc3 [V06,T00] ( 12, 27 ) long -> rcx ; V07 loc4 [V07,T06] ( 2, 4.50) long -> rdx -; V08 loc5 [V08,T08] ( 3, 12 ) simd32 -> mm3 +; V08 loc5 [V08,T09] ( 3, 12 ) simd32 -> mm2 ;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" -;* V10 tmp1 [V10 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V11 tmp2 [V11 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -;* V12 tmp3 [V12 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V13 tmp4 [V13 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V10 tmp1 [V10 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" +;* V11 tmp2 [V11 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" +; V12 tmp3 [V12,T08] ( 2, 16 ) simd32 -> mm0 "Spilling op1 side effects for HWIntrinsic" +;* V13 tmp4 [V13 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" ;* V14 tmp5 [V14 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V15 tmp6 [V15 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -;* V16 tmp7 [V16 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" -;* V17 tmp8 [V17 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V18 tmp9 [V18 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -; V19 cse0 [V19,T11] ( 3, 1.50) simd32 -> mm0 "CSE #02: moderate" -; V20 cse1 [V20,T12] ( 3, 1.50) simd32 -> mm0 "CSE #04: moderate" -; V21 cse2 [V21,T09] ( 7, 10.50) simd32 -> mm2 "CSE #01: moderate" -; V22 cse3 [V22,T10] ( 5, 7 ) simd32 -> mm1 "CSE #03: moderate" +;* V16 tmp7 [V16 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V17 tmp8 [V17 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V18 tmp9 [V18 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V19 tmp10 [V19 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V20 tmp11 [V20 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V21 tmp12 [V21 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V22 tmp13 [V22 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V23 tmp14 [V23 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V24 tmp15 [V24 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V25 tmp16 [V25 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V26 tmp17 [V26 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V27 tmp18 [V27 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V28 tmp19 [V28 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V29 tmp20 [V29 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +; V30 cse0 [V30,T10] ( 5, 7 ) simd32 -> mm1 "CSE #01: moderate" ; ; Lcl frame size = 0 G_M60588_IG01: push rbp mov rbp, rsp ;; size=4 bbWeight=1 PerfScore 1.25 G_M60588_IG02: vmovups ymm0, ymmword ptr [rdi] vmovups ymm1, ymmword ptr [reloc @RWD00] vptest ymm0, ymm1 jne G_M60588_IG10 ;; size=23 bbWeight=1 PerfScore 15.00 G_M60588_IG03: mov rax, rsi - vmovups ymm2, ymmword ptr [reloc @RWD32] - vpand ymm0, ymm2, ymm0 vpackuswb ymm0, ymm0, ymm0 vpermq ymm0, ymm0, -40 vmovups xmmword ptr [rax], xmm0 mov ecx, 16 test sil, 16 jne SHORT G_M60588_IG04 vmovups ymm0, ymmword ptr [rdi+0x20] vptest ymm0, ymm1 - jne G_M60588_IG08 - vpand ymm0, ymm2, ymm0 + jne SHORT G_M60588_IG08 vpackuswb ymm0, ymm0, ymm0 vpermq ymm0, ymm0, -40 vmovups xmmword ptr [rax+0x10], xmm0 - ;; size=75 bbWeight=0.50 PerfScore 13.71 + ;; size=55 bbWeight=0.50 PerfScore 11.38 G_M60588_IG04: and rsi, 31 mov rcx, rsi neg rcx add rcx, 32 add rdx, -32 align [0 bytes for IG05] ;; size=18 bbWeight=0.50 PerfScore 0.62 G_M60588_IG05: vmovups ymm0, ymmword ptr [rdi+2*rcx] - vmovups ymm3, ymmword ptr [rdi+2*rcx+0x20] - vpor ymm4, ymm0, ymm3 - vptest ymm4, ymm1 + vmovups ymm2, ymmword ptr [rdi+2*rcx+0x20] + vpor ymm3, ymm0, ymm2 + vptest ymm3, ymm1 je SHORT G_M60588_IG07 ;; size=22 bbWeight=4 PerfScore 65.33 G_M60588_IG06: vptest ymm0, ymm1 jne SHORT G_M60588_IG08 - vpand ymm3, ymm2, ymm0 - vpand ymm0, ymm2, ymm0 - vpackuswb ymm2, ymm3, ymm0 - vpermq ymm1, ymm2, -40 - vmovups xmmword ptr [rax+rcx], xmm1 + vpackuswb ymm0, ymm0, ymm0 + vpermq ymm2, ymm0, -40 + vmovups xmmword ptr [rax+rcx], xmm2 add rcx, 16 jmp SHORT G_M60588_IG08 - ;; size=36 bbWeight=0.50 PerfScore 6.96 + ;; size=28 bbWeight=0.50 PerfScore 6.62 G_M60588_IG07: - vpand ymm0, ymm2, ymm0 - vpand ymm3, ymm2, ymm3 - vpackuswb ymm0, ymm0, ymm3 + vpackuswb ymm0, ymm0, ymm2 vpermq ymm0, ymm0, -40 vmovups ymmword ptr [rax+rcx], ymm0 add rcx, 32 cmp rcx, rdx jbe SHORT G_M60588_IG05 - ;; size=32 bbWeight=4 PerfScore 28.67 + ;; size=24 bbWeight=4 PerfScore 26.00 G_M60588_IG08: mov rax, rcx ;; size=3 bbWeight=0.50 PerfScore 0.12 G_M60588_IG09: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 G_M60588_IG10: xor eax, eax ;; size=2 bbWeight=0.50 PerfScore 0.12 G_M60588_IG11: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 RWD00 dq FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h -RWD32 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh -; Total bytes of code 225, prolog size 4, PerfScore 134.29, instruction count 58, allocated bytes for code 225 (MethodHash=910c1353) for method System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong (FullOpts) +; Total bytes of code 189, prolog size 4, PerfScore 128.96, instruction count 51, allocated bytes for code 189 (MethodHash=910c1353) for method System.Text.Ascii:NarrowUtf16ToAscii_Intrinsified_256(ulong,ulong,ulong):ulong (FullOpts) ```
-36 (-5.99 % of base) - System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong ```diff ; Assembly listing for method System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Unix ; FullOpts code ; optimized code ; rbp based frame ; fully interruptible ; No PGO data -; 0 inlinees with PGO data; 9 single block inlinees; 16 inlinees without PGO data +; 0 inlinees with PGO data; 13 single block inlinees; 20 inlinees without PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T05] ( 8, 8.50) long -> rdi single-def ; V01 arg1 [V01,T04] ( 12, 10.50) long -> rsi single-def ; V02 arg2 [V02,T09] ( 7, 5 ) long -> rdx single-def ; V03 loc0 [V03,T00] ( 22, 29.50) long -> rax ; V04 loc1 [V04,T10] ( 13, 6.50) int -> rcx ;* V05 loc2 [V05 ] ( 0, 0 ) int -> zero-ref ; V06 loc3 [V06,T03] ( 7, 14 ) long -> registers ; V07 loc4 [V07,T18] ( 5, 2.50) long -> rdx ; V08 loc5 [V08,T13] ( 2, 4.50) long -> rcx ;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; V10 tmp1 [V10,T19] ( 3, 1.50) long -> rax "Inline return value spill temp" ; V11 tmp2 [V11,T06] ( 5, 9.50) byref -> rax single-def "Inline stloc first use temp" -; V12 tmp3 [V12,T25] ( 12, 16.50) simd32 -> mm0 "Inline stloc first use temp" +; V12 tmp3 [V12,T24] ( 14, 17.50) simd32 -> mm0 "Inline stloc first use temp" ; V13 tmp4 [V13,T11] ( 5, 6 ) byref -> rcx single-def "Inline stloc first use temp" -; V14 tmp5 [V14,T01] ( 12, 27 ) long -> r8 "Inline stloc first use temp" -; V15 tmp6 [V15,T14] ( 2, 4.50) long -> r9 "Inline stloc first use temp" -; V16 tmp7 [V16,T27] ( 3, 12 ) simd32 -> mm3 "Inline stloc first use temp" -;* V17 tmp8 [V17 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V18 tmp9 [V18 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -;* V19 tmp10 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V20 tmp11 [V20 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V14 tmp5 [V14 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" +; V15 tmp6 [V15,T01] ( 12, 27 ) long -> r8 "Inline stloc first use temp" +; V16 tmp7 [V16,T14] ( 2, 4.50) long -> r9 "Inline stloc first use temp" +; V17 tmp8 [V17,T28] ( 3, 12 ) simd32 -> mm2 "Inline stloc first use temp" +;* V18 tmp9 [V18 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" +; V19 tmp10 [V19,T26] ( 2, 16 ) simd32 -> mm0 "Spilling op1 side effects for HWIntrinsic" +;* V20 tmp11 [V20 ] ( 0, 0 ) simd32 -> zero-ref "spilled call-like call argument" ;* V21 tmp12 [V21 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V22 tmp13 [V22 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -;* V23 tmp14 [V23 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" -;* V24 tmp15 [V24 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V25 tmp16 [V25 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" -; V26 tmp17 [V26,T20] ( 3, 1.50) long -> rax "Inline return value spill temp" -;* V27 tmp18 [V27,T22] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" -;* V28 tmp19 [V28 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" -; V29 tmp20 [V29,T07] ( 5, 9.50) byref -> rax single-def "Inline stloc first use temp" -; V30 tmp21 [V30,T24] ( 14, 17.50) simd16 -> mm0 "Inline stloc first use temp" -; V31 tmp22 [V31,T12] ( 5, 6 ) byref -> rcx single-def "Inline stloc first use temp" -;* V32 tmp23 [V32 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" -; V33 tmp24 [V33,T02] ( 11, 26.50) long -> r8 "Inline stloc first use temp" -; V34 tmp25 [V34,T15] ( 2, 4.50) long -> r9 "Inline stloc first use temp" -; V35 tmp26 [V35,T28] ( 3, 12 ) simd16 -> mm2 "Inline stloc first use temp" -;* V36 tmp27 [V36 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" -; V37 tmp28 [V37,T26] ( 2, 16 ) simd16 -> mm0 "Spilling op1 side effects for HWIntrinsic" -;* V38 tmp29 [V38 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" -;* V39 tmp30 [V39 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V40 tmp31 [V40 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" -;* V41 tmp32 [V41 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" -;* V42 tmp33 [V42 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V43 tmp34 [V43 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" -;* V44 tmp35 [V44 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" -;* V45 tmp36 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V46 tmp37 [V46 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" -;* V47 tmp38 [V47 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" -;* V48 tmp39 [V48 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -;* V49 tmp40 [V49 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" -;* V50 tmp41 [V50 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" -;* V51 tmp42 [V51 ] ( 0, 0 ) long -> zero-ref "Inlining Arg" -; V52 tmp43 [V52,T23] ( 3, 24 ) simd16 -> mm0 "dup spill" -;* V53 tmp44 [V53 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" -;* V54 tmp45 [V54 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg" -; V55 tmp46 [V55,T16] ( 3, 3 ) byref -> r8 single-def "Inlining Arg" -; V56 tmp47 [V56,T17] ( 3, 3 ) byref -> rdx "Inlining Arg" -;* V57 tmp48 [V57,T21] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" -; V58 cse0 [V58,T08] ( 3, 8.50) long -> r10 "CSE #07: moderate" -; V59 cse1 [V59,T32] ( 3, 1.50) simd32 -> mm0 "CSE #02: moderate" -; V60 cse2 [V60,T33] ( 3, 1.50) simd32 -> mm0 "CSE #04: moderate" -; V61 cse3 [V61,T29] ( 7, 10.50) simd32 -> mm2 "CSE #01: aggressive" -; V62 cse4 [V62,T30] ( 5, 6 ) simd32 -> mm1 "CSE #03: moderate" -; V63 cse5 [V63,T31] ( 5, 6 ) simd16 -> mm1 "CSE #06: moderate" +;* V23 tmp14 [V23 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V24 tmp15 [V24 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V25 tmp16 [V25 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V26 tmp17 [V26 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V27 tmp18 [V27 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V28 tmp19 [V28 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V29 tmp20 [V29 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V30 tmp21 [V30 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V31 tmp22 [V31 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V32 tmp23 [V32 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +;* V33 tmp24 [V33 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V34 tmp25 [V34 ] ( 0, 0 ) simd32 -> zero-ref "Inline stloc first use temp" +;* V35 tmp26 [V35 ] ( 0, 0 ) simd32 -> zero-ref "Inline return value spill temp" +;* V36 tmp27 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" +; V37 tmp28 [V37,T20] ( 3, 1.50) long -> rax "Inline return value spill temp" +;* V38 tmp29 [V38,T22] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp" +;* V39 tmp30 [V39 ] ( 0, 0 ) long -> zero-ref "Inline stloc first use temp" +; V40 tmp31 [V40,T07] ( 5, 9.50) byref -> rax single-def "Inline stloc first use temp" +; V41 tmp32 [V41,T25] ( 14, 17.50) simd16 -> mm0 "Inline stloc first use temp" +; V42 tmp33 [V42,T12] ( 5, 6 ) byref -> rcx single-def "Inline stloc first use temp" +;* V43 tmp34 [V43 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" +; V44 tmp35 [V44,T02] ( 11, 26.50) long -> r8 "Inline stloc first use temp" +; V45 tmp36 [V45,T15] ( 2, 4.50) long -> r9 "Inline stloc first use temp" +; V46 tmp37 [V46,T29] ( 3, 12 ) simd16 -> mm2 "Inline stloc first use temp" +;* V47 tmp38 [V47 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" +; V48 tmp39 [V48,T27] ( 2, 16 ) simd16 -> mm0 "Spilling op1 side effects for HWIntrinsic" +;* V49 tmp40 [V49 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" +;* V50 tmp41 [V50 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V51 tmp42 [V51 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" +;* V52 tmp43 [V52 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" +;* V53 tmp44 [V53 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V54 tmp45 [V54 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" +;* V55 tmp46 [V55 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" +;* V56 tmp47 [V56 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V57 tmp48 [V57 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" +;* V58 tmp49 [V58 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" +;* V59 tmp50 [V59 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +;* V60 tmp51 [V60 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" +;* V61 tmp52 [V61 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" +;* V62 tmp53 [V62 ] ( 0, 0 ) long -> zero-ref "Inlining Arg" +; V63 tmp54 [V63,T23] ( 3, 24 ) simd16 -> mm0 "dup spill" +;* V64 tmp55 [V64 ] ( 0, 0 ) simd16 -> zero-ref "Inline stloc first use temp" +;* V65 tmp56 [V65 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg" +; V66 tmp57 [V66,T16] ( 3, 3 ) byref -> r8 single-def "Inlining Arg" +; V67 tmp58 [V67,T17] ( 3, 3 ) byref -> rdx "Inlining Arg" +;* V68 tmp59 [V68,T21] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" +; V69 cse0 [V69,T08] ( 3, 8.50) long -> r10 "CSE #03: moderate" +; V70 cse1 [V70,T30] ( 5, 6 ) simd32 -> mm1 "CSE #01: moderate" +; V71 cse2 [V71,T31] ( 5, 6 ) simd16 -> mm1 "CSE #02: moderate" ; ; Lcl frame size = 0 G_M6063_IG01: push rbp mov rbp, rsp ;; size=4 bbWeight=1 PerfScore 1.25 G_M6063_IG02: xor eax, eax cmp rdx, 32 jb G_M6063_IG18 ;; size=12 bbWeight=1 PerfScore 1.50 G_M6063_IG03: mov rcx, qword ptr [rdi] mov r8, 0xD1FFAB1E test rcx, r8 mov r8, rcx jne G_M6063_IG20 cmp rdx, 64 jae G_M6063_IG11 mov rax, rdi vmovups xmm0, xmmword ptr [rax] vmovups xmm1, xmmword ptr [reloc @RWD00] vptest xmm0, xmm1 jne G_M6063_IG09 mov rcx, rsi vpackuswb xmm0, xmm0, xmm0 vmovsd qword ptr [rcx], xmm0 mov r8d, 8 test sil, 8 jne SHORT G_M6063_IG04 vmovups xmm0, xmmword ptr [rax+0x10] vptest xmm0, xmm1 jne SHORT G_M6063_IG08 vpackuswb xmm0, xmm0, xmm0 vmovsd qword ptr [rcx+0x08], xmm0 ;; size=105 bbWeight=0.50 PerfScore 16.00 G_M6063_IG04: mov r8, rsi and r8, 15 neg r8 add r8, 16 lea r9, [rdx-0x10] align [0 bytes for IG05] ;; size=18 bbWeight=0.50 PerfScore 0.75 G_M6063_IG05: vmovups xmm0, xmmword ptr [rax+2*r8] lea r10, [r8+0x08] vmovups xmm2, xmmword ptr [rax+2*r10] vpor xmm3, xmm0, xmm2 vptest xmm3, xmm1 je SHORT G_M6063_IG07 ;; size=27 bbWeight=4 PerfScore 51.33 G_M6063_IG06: vptest xmm0, xmm1 jne SHORT G_M6063_IG08 vpackuswb xmm0, xmm0, xmm0 vmovsd qword ptr [rcx+r8], xmm0 mov r8, r10 jmp SHORT G_M6063_IG08 align [0 bytes for IG13] ;; size=22 bbWeight=0.50 PerfScore 4.62 G_M6063_IG07: vpackuswb xmm0, xmm0, xmm2 vmovups xmmword ptr [rcx+r8], xmm0 add r8, 16 cmp r8, r9 jbe SHORT G_M6063_IG05 ;; size=19 bbWeight=4 PerfScore 18.00 G_M6063_IG08: mov rax, r8 jmp SHORT G_M6063_IG10 ;; size=5 bbWeight=0.50 PerfScore 1.12 G_M6063_IG09: xor eax, eax ;; size=2 bbWeight=0.50 PerfScore 0.12 G_M6063_IG10: jmp G_M6063_IG18 ;; size=5 bbWeight=0.50 PerfScore 1.00 G_M6063_IG11: mov rax, rdi vmovups ymm0, ymmword ptr [rax] vmovups ymm1, ymmword ptr [reloc @RWD32] vptest ymm0, ymm1 jne G_M6063_IG17 mov rcx, rsi - vmovups ymm2, ymmword ptr [reloc @RWD64] - vpand ymm0, ymm2, ymm0 vpackuswb ymm0, ymm0, ymm0 vpermq ymm0, ymm0, -40 vmovups xmmword ptr [rcx], xmm0 mov r8d, 16 test sil, 16 jne SHORT G_M6063_IG12 vmovups ymm0, ymmword ptr [rax+0x20] vptest ymm0, ymm1 - jne G_M6063_IG16 - vpand ymm0, ymm2, ymm0 + jne SHORT G_M6063_IG16 vpackuswb ymm0, ymm0, ymm0 vpermq ymm0, ymm0, -40 vmovups xmmword ptr [rcx+0x10], xmm0 - ;; size=102 bbWeight=0.50 PerfScore 21.33 + ;; size=82 bbWeight=0.50 PerfScore 19.00 G_M6063_IG12: mov r8, rsi and r8, 31 neg r8 add r8, 32 lea r9, [rdx-0x20] ;; size=18 bbWeight=0.50 PerfScore 0.75 G_M6063_IG13: vmovups ymm0, ymmword ptr [rax+2*r8] - vmovups ymm3, ymmword ptr [rax+2*r8+0x20] - vpor ymm4, ymm0, ymm3 - vptest ymm4, ymm1 + vmovups ymm2, ymmword ptr [rax+2*r8+0x20] + vpor ymm3, ymm0, ymm2 + vptest ymm3, ymm1 je SHORT G_M6063_IG15 ;; size=24 bbWeight=4 PerfScore 65.33 G_M6063_IG14: vptest ymm0, ymm1 jne SHORT G_M6063_IG16 - vpand ymm3, ymm2, ymm0 - vpand ymm0, ymm2, ymm0 - vpackuswb ymm2, ymm3, ymm0 - vpermq ymm1, ymm2, -40 - vmovups xmmword ptr [rcx+r8], xmm1 + vpackuswb ymm0, ymm0, ymm0 + vpermq ymm2, ymm0, -40 + vmovups xmmword ptr [rcx+r8], xmm2 add r8, 16 jmp SHORT G_M6063_IG16 align [0 bytes for IG19] - ;; size=37 bbWeight=0.50 PerfScore 6.96 + ;; size=29 bbWeight=0.50 PerfScore 6.62 G_M6063_IG15: - vpand ymm0, ymm2, ymm0 - vpand ymm3, ymm2, ymm3 - vpackuswb ymm0, ymm0, ymm3 + vpackuswb ymm0, ymm0, ymm2 vpermq ymm0, ymm0, -40 vmovups ymmword ptr [rcx+r8], ymm0 add r8, 32 cmp r8, r9 jbe SHORT G_M6063_IG13 - ;; size=33 bbWeight=4 PerfScore 28.67 + ;; size=25 bbWeight=4 PerfScore 26.00 G_M6063_IG16: mov rax, r8 jmp SHORT G_M6063_IG18 ;; size=5 bbWeight=0.50 PerfScore 1.12 G_M6063_IG17: xor eax, eax ;; size=2 bbWeight=0.50 PerfScore 0.12 G_M6063_IG18: sub rdx, rax cmp rdx, 4 jb SHORT G_M6063_IG22 lea rcx, [rax+rdx-0x04] ;; size=14 bbWeight=0.50 PerfScore 1.25 G_M6063_IG19: mov r8, qword ptr [rdi+2*rax] mov r9, 0xD1FFAB1E test r8, r9 je SHORT G_M6063_IG21 ;; size=19 bbWeight=4 PerfScore 14.00 G_M6063_IG20: mov ecx, r8d test ecx, 0xD1FFAB1E jne SHORT G_M6063_IG23 lea rdx, [rsi+rax] mov byte ptr [rdx], cl shr ecx, 16 mov byte ptr [rdx+0x01], cl shr r8, 32 mov ecx, r8d add rax, 2 jmp SHORT G_M6063_IG23 ;; size=36 bbWeight=0.50 PerfScore 3.75 G_M6063_IG21: vmovd xmm0, r8 vpackuswb xmm0, xmm0, xmm0 vmovd dword ptr [rsi+rax], xmm0 add rax, 4 cmp rax, rcx jbe SHORT G_M6063_IG19 ;; size=23 bbWeight=4 PerfScore 26.00 G_M6063_IG22: test dl, 2 je SHORT G_M6063_IG25 mov ecx, dword ptr [rdi+2*rax] test ecx, 0xD1FFAB1E je SHORT G_M6063_IG24 ;; size=16 bbWeight=0.50 PerfScore 2.25 G_M6063_IG23: test ecx, 0xFF80 je SHORT G_M6063_IG26 jmp SHORT G_M6063_IG27 ;; size=10 bbWeight=0.50 PerfScore 1.62 G_M6063_IG24: lea r8, [rsi+rax] mov byte ptr [r8], cl shr ecx, 16 mov byte ptr [r8+0x01], cl add rax, 2 ;; size=18 bbWeight=0.50 PerfScore 1.62 G_M6063_IG25: test dl, 1 je SHORT G_M6063_IG27 movzx rcx, word ptr [rdi+2*rax] cmp ecx, 127 ja SHORT G_M6063_IG27 ;; size=14 bbWeight=0.50 PerfScore 2.25 G_M6063_IG26: mov byte ptr [rsi+rax], cl inc rax ;; size=6 bbWeight=0.50 PerfScore 0.62 G_M6063_IG27: vzeroupper pop rbp ret ;; size=5 bbWeight=1 PerfScore 2.50 RWD00 dq FF80FF80FF80FF80h, FF80FF80FF80FF80h RWD16 dd 00000000h, 00000000h, 00000000h, 00000000h RWD32 dq FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h -RWD64 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh, 00FF00FF00FF00FFh -; Total bytes of code 601, prolog size 4, PerfScore 275.88, instruction count 156, allocated bytes for code 605 (MethodHash=53fae850) for method System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong (FullOpts) +; Total bytes of code 565, prolog size 4, PerfScore 270.54, instruction count 149, allocated bytes for code 573 (MethodHash=53fae850) for method System.Text.Ascii:NarrowUtf16ToAscii(ulong,ulong,ulong):ulong (FullOpts) ```
-19 (-5.40 % of base) - System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte ```diff ; Assembly listing for method System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Unix ; FullOpts code ; optimized code ; rbp based frame ; fully interruptible ; No PGO data ; 0 inlinees with PGO data; 6 single block inlinees; 6 inlinees without PGO data ; Final local variable assignments ; ;* V00 arg0 [V00 ] ( 0, 0 ) struct (16) zero-ref multireg-arg ld-addr-op single-def ;* V01 arg1 [V01 ] ( 0, 0 ) struct (16) zero-ref multireg-arg ld-addr-op single-def ; V02 arg2 [V02,T06] ( 4, 3 ) byref -> rbx single-def ; V03 loc0 [V03,T00] ( 12, 42.50) long -> r15 ; V04 loc1 [V04,T02] ( 3, 9 ) long -> r13 ;* V05 loc2 [V05,T19] ( 0, 0 ) byref -> zero-ref single-def ;* V06 loc3 [V06,T20] ( 0, 0 ) byref -> zero-ref single-def ; V07 loc4 [V07 ] ( 2, 1 ) int -> [rbp-0x28] do-not-enreg[X] addr-exposed ld-addr-op -; V08 loc5 [V08,T22] ( 3, 24 ) simd16 -> mm8 -; V09 loc6 [V09,T23] ( 3, 24 ) simd16 -> mm9 +; V08 loc5 [V08,T21] ( 3, 24 ) simd16 -> mm7 +; V09 loc6 [V09,T22] ( 3, 24 ) simd16 -> mm8 ;* V10 loc7 [V10 ] ( 0, 0 ) simd16 -> zero-ref ;* V11 loc8 [V11 ] ( 0, 0 ) simd16 -> zero-ref -; V12 loc9 [V12,T25] ( 3, 16 ) simd16 -> mm10 +; V12 loc9 [V12,T24] ( 3, 16 ) simd16 -> mm9 ;* V13 loc10 [V13 ] ( 0, 0 ) simd16 -> zero-ref ;* V14 loc11 [V14 ] ( 0, 0 ) simd16 -> zero-ref ;* V15 loc12 [V15 ] ( 0, 0 ) simd16 -> zero-ref ;# V16 OutArgs [V16 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" -; V17 tmp1 [V17,T21] ( 3, 48 ) simd16 -> mm10 "dup spill" +; V17 tmp1 [V17,T23] ( 3, 24 ) simd16 -> mm9 ;* V18 tmp2 [V18 ] ( 0, 0 ) struct (16) zero-ref "impAppendStmt" ;* V19 tmp3 [V19 ] ( 0, 0 ) struct (16) zero-ref "spilled call-like call argument" ; V20 tmp4 [V20,T12] ( 2, 2 ) int -> rax "impAppendStmt" ;* V21 tmp5 [V21 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" ;* V22 tmp6 [V22 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ;* V23 tmp7 [V23 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg" ;* V24 tmp8 [V24 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" ;* V25 tmp9 [V25 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" ;* V26 tmp10 [V26 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" ;* V27 tmp11 [V27 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" ;* V28 tmp12 [V28 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" ;* V29 tmp13 [V29 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument" ;* V30 tmp14 [V30 ] ( 0, 0 ) simd16 -> zero-ref ld-addr-op "Inline stloc first use temp" ;* V31 tmp15 [V31 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V32 tmp16 [V32 ] ( 0, 0 ) simd16 -> zero-ref "Inline return value spill temp" ; V33 tmp17 [V33,T07] ( 4, 4 ) int -> r8 "Inlining Arg" ;* V34 tmp18 [V34 ] ( 0, 0 ) struct (16) zero-ref multireg-arg ld-addr-op "NewObj constructor temp" ; V35 tmp19 [V35,T10] ( 2, 2 ) byref -> rdi single-def "Inlining Arg" ; V36 tmp20 [V36,T13] ( 2, 2 ) int -> rsi "Inlining Arg" ; V37 tmp21 [V37,T08] ( 4, 4 ) int -> r8 "Inlining Arg" ;* V38 tmp22 [V38 ] ( 0, 0 ) struct (16) zero-ref multireg-arg ld-addr-op "NewObj constructor temp" ; V39 tmp23 [V39,T11] ( 2, 2 ) byref -> rdx single-def "Inlining Arg" ; V40 tmp24 [V40,T14] ( 2, 2 ) int -> rcx "Inlining Arg" ; V41 tmp25 [V41,T01] ( 4, 17.50) byref -> rdi single-def "field V00._reference (fldOffset=0x0)" P-INDEP ; V42 tmp26 [V42,T05] ( 5, 3.50) int -> rsi single-def "field V00._length (fldOffset=0x8)" P-INDEP ; V43 tmp27 [V43,T03] ( 3, 5.50) byref -> rdx single-def "field V01._reference (fldOffset=0x0)" P-INDEP ; V44 tmp28 [V44,T09] ( 3, 2 ) int -> rcx single-def "field V01._length (fldOffset=0x8)" P-INDEP ;* V45 tmp29 [V45 ] ( 0, 0 ) byref -> zero-ref single-def "field V18._reference (fldOffset=0x0)" P-INDEP ;* V46 tmp30 [V46 ] ( 0, 0 ) int -> zero-ref "field V18._length (fldOffset=0x8)" P-INDEP ;* V47 tmp31 [V47 ] ( 0, 0 ) byref -> zero-ref "field V19._reference (fldOffset=0x0)" P-INDEP ;* V48 tmp32 [V48 ] ( 0, 0 ) int -> zero-ref "field V19._length (fldOffset=0x8)" P-INDEP ;* V49 tmp33 [V49 ] ( 0, 0 ) byref -> zero-ref single-def "field V22._reference (fldOffset=0x0)" P-INDEP ;* V50 tmp34 [V50 ] ( 0, 0 ) int -> zero-ref "field V22._length (fldOffset=0x8)" P-INDEP ;* V51 tmp35 [V51 ] ( 0, 0 ) byref -> zero-ref single-def "field V23._reference (fldOffset=0x0)" P-INDEP ;* V52 tmp36 [V52 ] ( 0, 0 ) int -> zero-ref "field V23._length (fldOffset=0x8)" P-INDEP ; V53 tmp37 [V53,T15] ( 2, 1 ) byref -> rdi single-def "field V34._reference (fldOffset=0x0)" P-INDEP ; V54 tmp38 [V54,T17] ( 2, 1 ) int -> rsi "field V34._length (fldOffset=0x8)" P-INDEP ; V55 tmp39 [V55,T16] ( 2, 1 ) byref -> rdx single-def "field V38._reference (fldOffset=0x0)" P-INDEP ; V56 tmp40 [V56,T18] ( 2, 1 ) int -> rcx "field V38._length (fldOffset=0x8)" P-INDEP -; V57 cse0 [V57,T24] ( 3, 17 ) simd16 -> mm0 hoist "CSE #01: aggressive" +; V57 cse0 [V57,T25] ( 2, 9 ) simd16 -> mm0 hoist "CSE #01: aggressive" ; V58 cse1 [V58,T26] ( 2, 9 ) simd16 -> mm1 hoist "CSE #02: aggressive" ; V59 cse2 [V59,T27] ( 2, 9 ) simd16 -> mm2 hoist "CSE #03: aggressive" ; V60 cse3 [V60,T28] ( 2, 9 ) simd16 -> mm3 hoist "CSE #04: aggressive" ; V61 cse4 [V61,T29] ( 2, 9 ) simd16 -> mm4 hoist "CSE #05: aggressive" ; V62 cse5 [V62,T30] ( 2, 9 ) simd16 -> mm5 hoist "CSE #06: aggressive" ; V63 cse6 [V63,T31] ( 2, 9 ) simd16 -> mm6 hoist "CSE #07: aggressive" -; V64 cse7 [V64,T32] ( 2, 9 ) simd16 -> mm7 hoist "CSE #08: aggressive" -; V65 cse8 [V65,T04] ( 3, 6 ) long -> r14 "CSE #09: aggressive" +; V64 cse7 [V64,T04] ( 3, 6 ) long -> r14 "CSE #08: aggressive" ; ; Lcl frame size = 16 G_M6966_IG01: push rbp push r15 push r14 push r13 push rbx sub rsp, 16 lea rbp, [rsp+0x30] mov rbx, r8 ;; size=20 bbWeight=1 PerfScore 6.00 G_M6966_IG02: xor r15d, r15d mov r14d, esi lea r13, [r14-0x10] vmovups xmm0, xmmword ptr [reloc @RWD00] vmovups xmm1, xmmword ptr [reloc @RWD16] vmovups xmm2, xmmword ptr [reloc @RWD32] vmovups xmm3, xmmword ptr [reloc @RWD48] vmovups xmm4, xmmword ptr [reloc @RWD64] vmovups xmm5, xmmword ptr [reloc @RWD80] vmovups xmm6, xmmword ptr [reloc @RWD96] - vmovups xmm7, xmmword ptr [reloc @RWD112] jmp SHORT G_M6966_IG04 align [0 bytes for IG03] - ;; size=76 bbWeight=1 PerfScore 27.00 + ;; size=68 bbWeight=1 PerfScore 24.00 G_M6966_IG03: mov r15, r13 ;; size=3 bbWeight=4 PerfScore 1.00 G_M6966_IG04: - vmovups xmm8, xmmword ptr [rdi+2*r15] - vmovups xmm9, xmmword ptr [rdi+2*r15+0x10] - vpand xmm10, xmm0, xmm8 - vpand xmm11, xmm0, xmm9 - vpackuswb xmm10, xmm10, xmm11 - vpaddb xmm11, xmm1, xmm10 - vpsubusb xmm11, xmm11, xmm2 - vpsubb xmm11, xmm11, xmm3 - vpand xmm10, xmm4, xmm10 - vpsubb xmm10, xmm10, xmm5 - vpaddusb xmm10, xmm10, xmm6 - vpminub xmm10, xmm11, xmm10 - vpor xmm8, xmm8, xmm9 - vptest xmm8, xmm7 + vmovups xmm7, xmmword ptr [rdi+2*r15] + vmovups xmm8, xmmword ptr [rdi+2*r15+0x10] + vpackuswb xmm9, xmm7, xmm8 + vpaddb xmm10, xmm0, xmm9 + vpsubusb xmm10, xmm10, xmm1 + vpsubb xmm10, xmm10, xmm2 + vpand xmm9, xmm3, xmm9 + vpsubb xmm9, xmm9, xmm4 + vpaddusb xmm9, xmm9, xmm5 + vpminub xmm9, xmm10, xmm9 + vpor xmm7, xmm7, xmm8 + vptest xmm7, xmm6 jne SHORT G_M6966_IG06 - ;; size=71 bbWeight=8 PerfScore 132.00 + ;; size=61 bbWeight=8 PerfScore 126.67 G_M6966_IG05: - vpaddusb xmm8, xmm10, xmmword ptr [reloc @RWD128] - vpmovmskb r8d, xmm8 + vpaddusb xmm7, xmm9, xmmword ptr [reloc @RWD112] + vpmovmskb r8d, xmm7 test r8d, r8d je SHORT G_M6966_IG08 - ;; size=18 bbWeight=4 PerfScore 21.00 + ;; size=17 bbWeight=4 PerfScore 21.00 G_M6966_IG06: mov r8d, r15d cmp r8d, esi ja G_M6966_IG11 mov eax, r8d lea rdi, bword ptr [rdi+2*rax] sub esi, r8d mov r8, r15 shr r8, 1 cmp r8d, ecx ja SHORT G_M6966_IG11 mov eax, r8d add rdx, rax sub ecx, r8d lea r8, [rbp-0x28] mov rax, 0xD1FFAB1E ; code for System.HexConverter:TryDecodeFromUtf16_Scalar(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte call [rax]System.HexConverter:TryDecodeFromUtf16_Scalar(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte add r15d, dword ptr [rbp-0x28] mov dword ptr [rbx], r15d ;; size=65 bbWeight=0.50 PerfScore 6.00 G_M6966_IG07: add rsp, 16 pop rbx pop r13 pop r14 pop r15 pop rbp ret ;; size=13 bbWeight=0.50 PerfScore 1.88 G_M6966_IG08: - vpmaddubsw xmm8, xmm10, xmmword ptr [reloc @RWD144] - vpshufb xmm8, xmm8, xmmword ptr [reloc @RWD160] + vpmaddubsw xmm7, xmm9, xmmword ptr [reloc @RWD128] + vpshufb xmm7, xmm7, xmmword ptr [reloc @RWD144] mov rax, r15 shr rax, 1 - vmovd qword ptr [rdx+rax], xmm8 + vmovd qword ptr [rdx+rax], xmm7 add r15, 16 cmp r15, r14 je SHORT G_M6966_IG09 cmp r15, r13 jbe G_M6966_IG04 jmp G_M6966_IG03 ;; size=53 bbWeight=4 PerfScore 62.00 G_M6966_IG09: mov dword ptr [rbx], esi mov eax, 1 ;; size=7 bbWeight=0.50 PerfScore 0.62 G_M6966_IG10: add rsp, 16 pop rbx pop r13 pop r14 pop r15 pop rbp ret ;; size=13 bbWeight=0.50 PerfScore 1.88 G_M6966_IG11: mov rax, 0xD1FFAB1E ; code for System.ThrowHelper:ThrowArgumentOutOfRangeException() call [rax]System.ThrowHelper:ThrowArgumentOutOfRangeException() int3 ;; size=13 bbWeight=0 PerfScore 0.00 -RWD00 dq 00FF00FF00FF00FFh, 00FF00FF00FF00FFh -RWD16 dq C6C6C6C6C6C6C6C6h, C6C6C6C6C6C6C6C6h -RWD32 dq 0606060606060606h, 0606060606060606h -RWD48 dq F0F0F0F0F0F0F0F0h, F0F0F0F0F0F0F0F0h -RWD64 dq DFDFDFDFDFDFDFDFh, DFDFDFDFDFDFDFDFh -RWD80 dq 4141414141414141h, 4141414141414141h -RWD96 dq 0A0A0A0A0A0A0A0Ah, 0A0A0A0A0A0A0A0Ah -RWD112 dq FF80FF80FF80FF80h, FF80FF80FF80FF80h -RWD128 dq 7070707070707070h, 7070707070707070h -RWD144 dq 0110011001100110h, 0110011001100110h -RWD160 dq 0E0C0A0806040200h, 0000000000000000h +RWD00 dq C6C6C6C6C6C6C6C6h, C6C6C6C6C6C6C6C6h +RWD16 dq 0606060606060606h, 0606060606060606h +RWD32 dq F0F0F0F0F0F0F0F0h, F0F0F0F0F0F0F0F0h +RWD48 dq DFDFDFDFDFDFDFDFh, DFDFDFDFDFDFDFDFh +RWD64 dq 4141414141414141h, 4141414141414141h +RWD80 dq 0A0A0A0A0A0A0A0Ah, 0A0A0A0A0A0A0A0Ah +RWD96 dq FF80FF80FF80FF80h, FF80FF80FF80FF80h +RWD112 dq 7070707070707070h, 7070707070707070h +RWD128 dq 0110011001100110h, 0110011001100110h +RWD144 dq 0E0C0A0806040200h, 0000000000000000h -; Total bytes of code 352, prolog size 20, PerfScore 259.38, instruction count 89, allocated bytes for code 352 (MethodHash=bb7ae4c9) for method System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte (FullOpts) +; Total bytes of code 333, prolog size 20, PerfScore 251.04, instruction count 86, allocated bytes for code 333 (MethodHash=bb7ae4c9) for method System.HexConverter:TryDecodeFromUtf16_Vector128(System.ReadOnlySpan`1[ushort],System.Span`1[ubyte],byref):ubyte (FullOpts) ```
MihuBot commented 3 months ago

@MihaZupan