MihuBot / runtime-utils

0 stars 0 forks source link

[JitDiff X64] [tannergooding] Improve the handling of SIMD comparisons #538

Open MihuBot opened 2 months ago

MihuBot commented 2 months ago

Job completed in 14 minutes. https://github.com/dotnet/runtime/pull/104944

Diffs

Diffs ``` Found 277 files with textual diffs. Summary of Code Size diffs: (Lower is better) Total bytes of base: 39198197 Total bytes of diff: 39195568 Total bytes of delta: -2629 (-0.01 % of base) Total relative delta: -19.78 diff is an improvement. relative diff is an improvement. Top file improvements (bytes): -2316 : System.Private.CoreLib.dasm (-0.03 % of base) -101 : System.Memory.dasm (-0.03 % of base) -31 : System.Collections.Immutable.dasm (-0.00 % of base) -30 : System.Text.Json.dasm (-0.00 % of base) -26 : System.Collections.dasm (-0.01 % of base) -25 : System.Collections.Concurrent.dasm (-0.01 % of base) -15 : System.Composition.TypedParts.dasm (-0.03 % of base) -15 : System.Linq.Expressions.dasm (-0.00 % of base) -15 : System.ComponentModel.Composition.dasm (-0.00 % of base) -15 : System.Data.Common.dasm (-0.00 % of base) -10 : Microsoft.CSharp.dasm (-0.00 % of base) -5 : System.Linq.dasm (-0.00 % of base) -5 : System.Reflection.Context.dasm (-0.01 % of base) -5 : System.Numerics.Tensors.dasm (-0.00 % of base) -5 : System.Linq.Queryable.dasm (-0.00 % of base) -5 : System.Linq.Parallel.dasm (-0.00 % of base) -5 : System.ComponentModel.TypeConverter.dasm (-0.00 % of base) 17 total files with Code Size differences (17 improved, 0 regressed), 242 unchanged. Top method improvements (bytes): -203 (-10.75 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -198 (-10.55 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -134 (-10.09 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -129 (-9.58 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -95 (-11.31 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOf[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):int (FullOpts) -62 (-7.87 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:Contains[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):ubyte (FullOpts) -62 (-7.53 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOf[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):int (FullOpts) -59 (-6.50 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:SequenceEqual[System.Numerics.Vector`1[float]](byref,byref,int):ubyte (FullOpts) -31 (-51.67 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[long]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):ubyte (FullOpts) -30 (-4.50 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:IndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.ReadOnlySpan`1[System.Numerics.Vector`1[float]]):int (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[short]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[short],System.Runtime.Intrinsics.Vector128`1[short]):ubyte (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[ubyte]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts) -26 (-1.00 % of base) : System.Memory.dasm - System.Buffers.SequenceReader`1[System.Numerics.Vector`1[float]]:TryReadTo(byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],ubyte):ubyte:this (FullOpts) (2 methods) -24 (-10.08 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4:Equals(System.Numerics.Matrix4x4):ubyte:this (FullOpts) -24 (-9.06 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4:Equals(System.Object):ubyte:this (FullOpts) -24 (-10.00 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:Equals(byref):ubyte:this (FullOpts) -24 (-9.16 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:Equals(System.Object):ubyte:this (FullOpts) -24 (-10.08 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:System.IEquatable.Equals(System.Numerics.Matrix4x4+Impl):ubyte:this (FullOpts) -24 (-9.80 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -24 (-9.84 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -22 (-8.98 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2:Equals(System.Object):ubyte:this (FullOpts) -22 (-9.09 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:Equals(System.Object):ubyte:this (FullOpts) -21 (-87.50 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -20 (-4.37 % of base) : System.Memory.dasm - System.Buffers.SequenceReader`1[System.Numerics.Vector`1[float]]:AdvancePastAny(System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):long:this (FullOpts) -20 (-0.82 % of base) : System.Memory.dasm - System.Buffers.SequenceReader`1[System.Numerics.Vector`1[float]]:TryReadToSlow(byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int,ubyte):ubyte:this (FullOpts) (2 methods) -19 (-9.95 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -18 (-8.26 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2:Equals(System.Numerics.Matrix3x2):ubyte:this (FullOpts) -18 (-8.18 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:Equals(byref):ubyte:this (FullOpts) -18 (-8.26 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:System.IEquatable.Equals(System.Numerics.Matrix3x2+Impl):ubyte:this (FullOpts) -16 (-84.21 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Numerics.Vector`1[double]):ubyte (FullOpts) -16 (-84.21 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -15 (-4.09 % of base) : System.Memory.dasm - System.Buffers.SequenceReader`1[System.Numerics.Vector`1[float]]:AdvancePastAny(System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):long:this (FullOpts) -15 (-10.34 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -15 (-7.81 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -14 (-3.81 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.ObjectEqualityComparer`1[System.Numerics.Vector`1[float]]:IndexOf(System.Numerics.Vector`1[float][],System.Numerics.Vector`1[float],int,int):int:this (FullOpts) -14 (-3.76 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.ObjectEqualityComparer`1[System.Numerics.Vector`1[float]]:LastIndexOf(System.Numerics.Vector`1[float][],System.Numerics.Vector`1[float],int,int):int:this (FullOpts) -13 (-6.81 % of base) : System.Collections.dasm - System.Collections.Generic.LinkedList`1[System.Numerics.Vector`1[float]]:FindLast(System.Numerics.Vector`1[float]):System.Collections.Generic.LinkedListNode`1[System.Numerics.Vector`1[float]]:this (FullOpts) -13 (-4.15 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.NullableEqualityComparer`1[System.Numerics.Vector`1[float]]:IndexOf(System.Nullable`1[System.Numerics.Vector`1[float]][],System.Nullable`1[System.Numerics.Vector`1[float]],int,int):int:this (FullOpts) -13 (-4.13 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.NullableEqualityComparer`1[System.Numerics.Vector`1[float]]:LastIndexOf(System.Nullable`1[System.Numerics.Vector`1[float]][],System.Nullable`1[System.Numerics.Vector`1[float]],int,int):int:this (FullOpts) -13 (-2.06 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.ReadOnlySpan`1[System.Numerics.Vector`1[float]]):int (FullOpts) -13 (-81.25 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.64 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[int](System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[long](System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanOrEqualAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.64 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[int](System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[long](System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanOrEqualAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[long](System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[long]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanOrEqualAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[long](System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[long]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanOrEqualAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -10 (-3.44 % of base) : System.Memory.dasm - System.Buffers.SequenceReader`1[System.Numerics.Vector`1[float]]:AdvancePastAny(System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):long:this (FullOpts) -10 (-4.78 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.GenericEqualityComparer`1[System.Numerics.Vector`1[float]]:IndexOf(System.Numerics.Vector`1[float][],System.Numerics.Vector`1[float],int,int):int:this (FullOpts) -10 (-4.69 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.GenericEqualityComparer`1[System.Numerics.Vector`1[float]]:LastIndexOf(System.Numerics.Vector`1[float][],System.Numerics.Vector`1[float],int,int):int:this (FullOpts) -10 (-4.61 % of base) : System.Data.Common.dasm - System.Data.DataRowComparer:CompareEquatableArray[System.Numerics.Vector`1[float]](System.Numerics.Vector`1[float][],System.Numerics.Vector`1[float][]):ubyte (FullOpts) -10 (-7.14 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:ContainsAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte (FullOpts) -10 (-7.14 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:ContainsAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:IndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:IndexOfAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -10 (-11.63 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[short](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -10 (-11.63 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[short](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[int](System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[short](System.Runtime.Intrinsics.Vector512`1[short],System.Runtime.Intrinsics.Vector512`1[short]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[int](System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[short](System.Runtime.Intrinsics.Vector512`1[short],System.Runtime.Intrinsics.Vector512`1[short]):ubyte (FullOpts) -10 (-7.25 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -10 (-7.30 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -9 (-3.53 % of base) : System.Collections.Immutable.dasm - System.Collections.Frozen.ValueTypeDefaultComparerFrozenDictionary`2[System.Numerics.Vector`1[float],System.Nullable`1[int]]:GetValueRefOrNullRefCore(System.Numerics.Vector`1[float]):byref:this (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Plane:Equals(System.Numerics.Plane):ubyte:this (FullOpts) -6 (-6.98 % of base) : System.Private.CoreLib.dasm - System.Numerics.Plane:Equals(System.Object):ubyte:this (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Quaternion:Equals(System.Numerics.Quaternion):ubyte:this (FullOpts) -6 (-6.98 % of base) : System.Private.CoreLib.dasm - System.Numerics.Quaternion:Equals(System.Object):ubyte:this (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanOrEqualAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanOrEqualAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long]):ubyte (FullOpts) -6 (-9.38 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector2:Equals(System.Numerics.Vector2):ubyte:this (FullOpts) -6 (-6.12 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector2:Equals(System.Object):ubyte:this (FullOpts) -6 (-7.32 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector3:Equals(System.Numerics.Vector3):ubyte:this (FullOpts) -6 (-5.36 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector3:Equals(System.Object):ubyte:this (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector4:Equals(System.Numerics.Vector4):ubyte:this (FullOpts) -6 (-6.98 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector4:Equals(System.Object):ubyte:this (FullOpts) -6 (-6.98 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:Equals(System.Object):ubyte:this (FullOpts) -6 (-9.68 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:Equals(System.Runtime.Intrinsics.Vector128`1[double]):ubyte:this (FullOpts) -6 (-10.00 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanOrEqualAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanOrEqualAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.65 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -6 (-17.65 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):ubyte (FullOpts) Top method improvements (percentages): -21 (-87.50 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -16 (-84.21 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Numerics.Vector`1[double]):ubyte (FullOpts) -16 (-84.21 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -13 (-81.25 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.AnyWhereAllBitsSet(System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -31 (-51.67 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[long]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):ubyte (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[short]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[short],System.Runtime.Intrinsics.Vector128`1[short]):ubyte (FullOpts) -29 (-50.88 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[ubyte]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]):ubyte (FullOpts) -6 (-17.65 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -6 (-17.65 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[int],System.Runtime.Intrinsics.Vector128`1[int]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanOrEqualAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanOrEqualAll(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):ubyte (FullOpts) -6 (-17.14 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[long],System.Runtime.Intrinsics.Vector128`1[long]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanOrEqualAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanOrEqualAll(System.Numerics.Vector`1[double],System.Numerics.Vector`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[long],System.Numerics.Vector`1[long]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanOrEqualAll(System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanOrEqualAll(System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -6 (-15.79 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[long]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -5 (-15.15 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector128`1[short],System.Runtime.Intrinsics.Vector128`1[short]):ubyte (FullOpts) -5 (-15.15 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector128`1[short],System.Runtime.Intrinsics.Vector128`1[short]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[int],System.Numerics.Vector`1[int]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[int],System.Numerics.Vector`1[int]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Numerics.Vector`1[short],System.Numerics.Vector`1[short]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Numerics.Vector`1[short],System.Numerics.Vector`1[short]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[int]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.GreaterThanAll(System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -5 (-13.89 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[short]:System.Runtime.Intrinsics.ISimdVector,T>.LessThanAll(System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -12 (-13.64 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[int](System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -12 (-13.64 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[int](System.Runtime.Intrinsics.Vector256`1[int],System.Runtime.Intrinsics.Vector256`1[int]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[long](System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanOrEqualAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[long](System.Runtime.Intrinsics.Vector256`1[long],System.Runtime.Intrinsics.Vector256`1[long]):ubyte (FullOpts) -12 (-13.33 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanOrEqualAll[double](System.Runtime.Intrinsics.Vector256`1[double],System.Runtime.Intrinsics.Vector256`1[double]):ubyte (FullOpts) -10 (-11.63 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:GreaterThanAll[short](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -10 (-11.63 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256:LessThanAll[short](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[long](System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[long]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanOrEqualAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[long](System.Runtime.Intrinsics.Vector512`1[long],System.Runtime.Intrinsics.Vector512`1[long]):ubyte (FullOpts) -12 (-11.32 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanOrEqualAll[double](System.Runtime.Intrinsics.Vector512`1[double],System.Runtime.Intrinsics.Vector512`1[double]):ubyte (FullOpts) -95 (-11.31 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOf[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):int (FullOpts) -203 (-10.75 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -198 (-10.55 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Plane:Equals(System.Numerics.Plane):ubyte:this (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Quaternion:Equals(System.Numerics.Quaternion):ubyte:this (FullOpts) -6 (-10.53 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector4:Equals(System.Numerics.Vector4):ubyte:this (FullOpts) -15 (-10.34 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -134 (-10.09 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -24 (-10.08 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4:Equals(System.Numerics.Matrix4x4):ubyte:this (FullOpts) -24 (-10.08 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:System.IEquatable.Equals(System.Numerics.Matrix4x4+Impl):ubyte:this (FullOpts) -24 (-10.00 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:Equals(byref):ubyte:this (FullOpts) -6 (-10.00 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:EqualsFloatingPoint(System.Runtime.Intrinsics.Vector128`1[double],System.Runtime.Intrinsics.Vector128`1[double]):ubyte (FullOpts) -19 (-9.95 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -24 (-9.84 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[int](System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:GreaterThanAll[short](System.Runtime.Intrinsics.Vector512`1[short],System.Runtime.Intrinsics.Vector512`1[short]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[int](System.Runtime.Intrinsics.Vector512`1[int],System.Runtime.Intrinsics.Vector512`1[int]):ubyte (FullOpts) -10 (-9.80 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector512:LessThanAll[short](System.Runtime.Intrinsics.Vector512`1[short],System.Runtime.Intrinsics.Vector512`1[short]):ubyte (FullOpts) -24 (-9.80 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -6 (-9.68 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1[double]:Equals(System.Runtime.Intrinsics.Vector128`1[double]):ubyte:this (FullOpts) -129 (-9.58 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -6 (-9.38 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector2:Equals(System.Numerics.Vector2):ubyte:this (FullOpts) -24 (-9.16 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4+Impl:Equals(System.Object):ubyte:this (FullOpts) -22 (-9.09 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:Equals(System.Object):ubyte:this (FullOpts) -24 (-9.06 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix4x4:Equals(System.Object):ubyte:this (FullOpts) -22 (-8.98 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2:Equals(System.Object):ubyte:this (FullOpts) -5 (-8.33 % of base) : System.Private.CoreLib.dasm - System.ValueTuple`1[System.Numerics.Vector`1[float]]:Equals(System.ValueTuple`1[System.Numerics.Vector`1[float]]):ubyte:this (FullOpts) -18 (-8.26 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2:Equals(System.Numerics.Matrix3x2):ubyte:this (FullOpts) -18 (-8.26 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:System.IEquatable.Equals(System.Numerics.Matrix3x2+Impl):ubyte:this (FullOpts) -18 (-8.18 % of base) : System.Private.CoreLib.dasm - System.Numerics.Matrix3x2+Impl:Equals(byref):ubyte:this (FullOpts) -5 (-8.06 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.GenericEqualityComparer`1[System.Numerics.Vector`1[float]]:Equals(System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte:this (FullOpts) -5 (-8.06 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.ObjectEqualityComparer`1[System.Numerics.Vector`1[float]]:Equals(System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte:this (FullOpts) -62 (-7.87 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:Contains[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):ubyte (FullOpts) -5 (-7.81 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[double]:Equals(System.Numerics.Vector`1[double]):ubyte:this (FullOpts) -5 (-7.81 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1[double]:Equals(System.Runtime.Intrinsics.Vector256`1[double]):ubyte:this (FullOpts) -15 (-7.81 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -62 (-7.53 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOf[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],int):int (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:IndexOfAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:IndexOfAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -10 (-7.41 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):int (FullOpts) -6 (-7.32 % of base) : System.Private.CoreLib.dasm - System.Numerics.Vector3:Equals(System.Numerics.Vector3):ubyte:this (FullOpts) -10 (-7.30 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:LastIndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -10 (-7.25 % of base) : System.Private.CoreLib.dasm - System.SpanHelpers:IndexOfAnyExcept[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) -10 (-7.14 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:ContainsAnyExcept[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte (FullOpts) -10 (-7.14 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:ContainsAnyExcept[System.Numerics.Vector`1[float]](System.Span`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float]):ubyte (FullOpts) -5 (-7.14 % of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:StartsWith[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.Numerics.Vector`1[float]):ubyte (FullOpts) 199 total methods with Code Size differences (199 improved, 0 regressed), 230442 unchanged. -------------------------------------------------------------------------------- ```

Artifacts:

xtqqczze commented 2 months ago

@MihaZupan Ah, didn't expect this to trigger from https://github.com/dotnet/runtime/pull/104944#issuecomment-2234237011.

MihuBot commented 2 months ago

Top method improvements

-203 (-10.75 % of base) - System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int ```diff ; Assembly listing for method System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX512 - Unix ; FullOpts code ; optimized code ; rbp based frame ; fully interruptible ; No PGO data ; 0 inlinees with PGO data; 0 single block inlinees; 39 inlinees without PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T01] ( 15, 40 ) byref -> rdi single-def ; V01 arg1 [V01,T09] ( 19, 40 ) simd32 -> mm2 ld-addr-op single-def ; V02 arg2 [V02,T05] ( 19, 55 ) simd32 -> mm0 ld-addr-op single-def ; V03 arg3 [V03,T06] ( 19, 55 ) simd32 -> mm1 ld-addr-op single-def ; V04 arg4 [V04,T02] ( 7, 12 ) int -> rsi single-def ; V05 loc0 [V05,T04] ( 78,228 ) simd32 -> mm4 ld-addr-op ; V06 loc1 [V06,T00] ( 32, 69 ) int -> rax ;* V07 loc2 [V07 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op ;# V08 OutArgs [V08 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V09 tmp1 [V09 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V10 tmp2 [V10 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V11 tmp3 [V11 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V12 tmp4 [V12 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V13 tmp5 [V13 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V14 tmp6 [V14 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V15 tmp7 [V15 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V16 tmp8 [V16 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V17 tmp9 [V17 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V18 tmp10 [V18 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V19 tmp11 [V19 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V20 tmp12 [V20 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V21 tmp13 [V21 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V22 tmp14 [V22 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V23 tmp15 [V23 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V24 tmp16 [V24 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V25 tmp17 [V25 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V26 tmp18 [V26 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V27 tmp19 [V27 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V28 tmp20 [V28 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V29 tmp21 [V29 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V30 tmp22 [V30 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V31 tmp23 [V31 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V32 tmp24 [V32 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V33 tmp25 [V33 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V34 tmp26 [V34 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V35 tmp27 [V35 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V36 tmp28 [V36 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V37 tmp29 [V37 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V38 tmp30 [V38 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V39 tmp31 [V39 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V40 tmp32 [V40 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V41 tmp33 [V41 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V42 tmp34 [V42 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V43 tmp35 [V43 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V44 tmp36 [V44 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V45 tmp37 [V45 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V46 tmp38 [V46 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V47 tmp39 [V47 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V48 tmp40 [V48 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V49 tmp41 [V49 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V50 tmp42 [V50 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V51 tmp43 [V51 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V52 tmp44 [V52 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V53 tmp45 [V53 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V54 tmp46 [V54 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V55 tmp47 [V55 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V56 tmp48 [V56 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V57 tmp49 [V57 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V58 tmp50 [V58 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V59 tmp51 [V59 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V60 tmp52 [V60 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V61 tmp53 [V61 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V62 tmp54 [V62 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V63 tmp55 [V63 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V64 tmp56 [V64 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V65 tmp57 [V65 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V66 tmp58 [V66 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V67 tmp59 [V67 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V68 tmp60 [V68 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V69 tmp61 [V69 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V70 tmp62 [V70 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V71 tmp63 [V71 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V72 tmp64 [V72 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V73 tmp65 [V73 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V74 tmp66 [V74 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V75 tmp67 [V75 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V76 tmp68 [V76 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V77 tmp69 [V77 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V78 tmp70 [V78 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V79 tmp71 [V79 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V80 tmp72 [V80 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V81 tmp73 [V81 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V82 tmp74 [V82 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V83 tmp75 [V83 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V84 tmp76 [V84 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V85 tmp77 [V85 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V86 tmp78 [V86 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ; V87 tmp79 [V87,T03] ( 9, 9 ) int -> rax "Single return block return value" ; V88 cse0 [V88,T10] ( 16, 39 ) simd32 -> mm3 hoist multi-def "CSE #01: moderate" ; V89 cse1 [V89,T07] ( 16, 46.50) simd32 -> mm6 multi-def "CSE #03: moderate" ; V90 cse2 [V90,T08] ( 16, 46.50) simd32 -> mm7 multi-def "CSE #05: moderate" ; V91 cse3 [V91,T11] ( 4, 16 ) simd32 -> mm5 "CSE #02: moderate" ; V92 cse4 [V92,T12] ( 4, 16 ) simd32 -> mm5 "CSE #04: moderate" ; V93 cse5 [V93,T13] ( 4, 16 ) simd32 -> mm5 "CSE #06: moderate" ; V94 cse6 [V94,T14] ( 4, 16 ) simd32 -> mm5 "CSE #07: moderate" ; V95 cse7 [V95,T15] ( 4, 16 ) simd32 -> mm5 "CSE #08: moderate" ; V96 cse8 [V96,T16] ( 4, 16 ) simd32 -> mm5 "CSE #09: moderate" ; V97 cse9 [V97,T17] ( 4, 16 ) simd32 -> mm5 "CSE #10: moderate" ; V98 cse10 [V98,T18] ( 4, 16 ) simd32 -> mm5 "CSE #11: moderate" ; V99 cse11 [V99,T19] ( 4, 16 ) simd32 -> mm5 "CSE #16: moderate" ; V100 cse12 [V100,T20] ( 4, 2 ) simd32 -> mm5 "CSE #12: conservative" ; V101 cse13 [V101,T21] ( 4, 2 ) simd32 -> mm5 "CSE #13: conservative" ; V102 cse14 [V102,T22] ( 4, 2 ) simd32 -> mm5 "CSE #14: conservative" ; V103 cse15 [V103,T23] ( 4, 2 ) simd32 -> mm5 "CSE #15: conservative" ; ; Lcl frame size = 0 G_M13892_IG01: push rbp mov rbp, rsp vmovups ymm2, ymmword ptr [rbp+0x10] vmovups ymm0, ymmword ptr [rbp+0x30] vmovups ymm1, ymmword ptr [rbp+0x50] ;; size=19 bbWeight=1 PerfScore 13.25 G_M13892_IG02: xor eax, eax cmp esi, 8 jl G_M13892_IG19 ;; size=11 bbWeight=1 PerfScore 1.50 G_M13892_IG03: vcmpps ymm3, ymm2, ymm2, 0 align [0 bytes for IG04] ;; size=5 bbWeight=0.25 PerfScore 0.75 G_M13892_IG04: movsxd rcx, eax shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm6, ymm5, ymm3, 17 vcmpps ymm7, ymm4, ymm2, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm6, ymm7 + jb G_M13892_IG26 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm7, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm7, ymm7, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm7 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm7, ymm8 + jb G_M13892_IG26 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm4, ymm5 + jb G_M13892_IG26 lea ecx, [rax+0x01] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG06 - ;; size=271 bbWeight=4 PerfScore 267.00 + ;; size=241 bbWeight=4 PerfScore 351.00 G_M13892_IG05: inc eax - jmp G_M13892_IG27 - align [0 bytes for IG24] + jmp G_M13892_IG26 + align [0 bytes for IG23] ;; size=7 bbWeight=0.50 PerfScore 1.12 G_M13892_IG06: lea ecx, [rax+0x02] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG08 - ;; size=134 bbWeight=4 PerfScore 126.00 + ;; size=119 bbWeight=4 PerfScore 168.00 G_M13892_IG07: add eax, 2 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG08: lea ecx, [rax+0x03] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG10 - ;; size=134 bbWeight=4 PerfScore 126.00 + ;; size=119 bbWeight=4 PerfScore 168.00 G_M13892_IG09: add eax, 3 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG10: lea ecx, [rax+0x04] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG12 - ;; size=134 bbWeight=4 PerfScore 126.00 + ;; size=119 bbWeight=4 PerfScore 168.00 G_M13892_IG11: add eax, 4 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG12: lea ecx, [rax+0x05] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG14 - ;; size=134 bbWeight=4 PerfScore 126.00 + ;; size=119 bbWeight=4 PerfScore 168.00 G_M13892_IG13: add eax, 5 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG14: lea ecx, [rax+0x06] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG16 - ;; size=134 bbWeight=4 PerfScore 126.00 + ;; size=119 bbWeight=4 PerfScore 168.00 G_M13892_IG15: add eax, 6 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG16: lea ecx, [rax+0x07] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl vpternlogd ymm6, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm6, ymm6, ymm8 - vpcmpeqd k1, ymm9, ymm6 - kortestb k1, k1 + vptest ymm6, ymm9 setb dl movzx rdx, dl or ecx, edx vpternlogd ymm7, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm7, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb dl movzx rdx, dl or ecx, edx je SHORT G_M13892_IG18 - ;; size=129 bbWeight=4 PerfScore 124.00 + ;; size=114 bbWeight=4 PerfScore 166.00 G_M13892_IG17: add eax, 7 - jmp G_M13892_IG27 + jmp G_M13892_IG26 ;; size=8 bbWeight=0.50 PerfScore 1.12 G_M13892_IG18: add eax, 8 mov ecx, esi sub ecx, eax cmp ecx, 8 jge G_M13892_IG04 ;; size=16 bbWeight=4 PerfScore 8.00 G_M13892_IG19: mov ecx, esi sub ecx, eax cmp ecx, 4 - jl G_M13892_IG22 + jl G_M13892_IG21 movsxd rcx, eax shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm3, ymm2, ymm2, 0 vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm6, ymm5, ymm3, 17 vcmpps ymm7, ymm4, ymm2, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm6, ymm7 + jb G_M13892_IG26 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm7, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm7, ymm7, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm7 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm7, ymm8 + jb G_M13892_IG26 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M13892_IG27 + vptest ymm4, ymm5 + jb G_M13892_IG26 lea ecx, [rax+0x01] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 jb G_M13892_IG05 vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 jb G_M13892_IG05 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - ;; size=271 bbWeight=0.50 PerfScore 33.63 -G_M13892_IG20: - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 + vptest ymm4, ymm5 jb G_M13892_IG05 lea ecx, [rax+0x02] movsxd rcx, ecx + ;; size=263 bbWeight=0.50 PerfScore 45.75 +G_M13892_IG20: shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 jb G_M13892_IG07 vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 jb G_M13892_IG07 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 + vptest ymm4, ymm5 jb G_M13892_IG07 lea ecx, [rax+0x03] movsxd rcx, ecx shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm3, ymm5, ymm3, 17 vcmpps ymm8, ymm4, ymm2, 0 vorps ymm3, ymm3, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm3 - kortestb k1, k1 + vptest ymm3, ymm8 jb G_M13892_IG09 vpternlogd ymm6, ymm5, ymm6, 17 vcmpps ymm3, ymm4, ymm0, 0 vorps ymm3, ymm6, ymm3 vpcmpeqd ymm6, ymm6, ymm6 - vpcmpeqd k1, ymm6, ymm3 - kortestb k1, k1 + vptest ymm3, ymm6 jb G_M13892_IG09 vpternlogd ymm7, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm3, ymm7, ymm4 vpcmpeqd ymm4, ymm4, ymm4 - vpcmpeqd k1, ymm4, ymm3 - kortestb k1, k1 - ;; size=272 bbWeight=0.50 PerfScore 30.50 -G_M13892_IG21: + vptest ymm3, ymm4 jb G_M13892_IG09 add eax, 4 - ;; size=9 bbWeight=0.50 PerfScore 0.62 -G_M13892_IG22: + ;; size=229 bbWeight=0.50 PerfScore 40.00 +G_M13892_IG21: cmp eax, esi - jge G_M13892_IG25 - ;; size=8 bbWeight=0.50 PerfScore 0.62 -G_M13892_IG23: + jge SHORT G_M13892_IG24 + ;; size=4 bbWeight=0.50 PerfScore 0.62 +G_M13892_IG22: vcmpps ymm3, ymm2, ymm2, 0 ;; size=5 bbWeight=0.25 PerfScore 0.75 -G_M13892_IG24: +G_M13892_IG23: movsxd rcx, eax shl rcx, 5 vmovups ymm4, ymmword ptr [rdi+rcx] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm6, ymm5, ymm3, 17 vcmpps ymm7, ymm4, ymm2, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb SHORT G_M13892_IG27 + vptest ymm6, ymm7 + jb SHORT G_M13892_IG26 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm6, ymm5, ymm6, 17 vcmpps ymm7, ymm4, ymm0, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb SHORT G_M13892_IG27 + vptest ymm6, ymm7 + jb SHORT G_M13892_IG26 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb SHORT G_M13892_IG27 + vptest ymm4, ymm5 + jb SHORT G_M13892_IG26 inc eax cmp eax, esi - jl G_M13892_IG24 - ;; size=133 bbWeight=4 PerfScore 147.00 -G_M13892_IG25: + jl SHORT G_M13892_IG23 + ;; size=114 bbWeight=4 PerfScore 189.00 +G_M13892_IG24: mov eax, -1 ;; size=5 bbWeight=0.50 PerfScore 0.12 -G_M13892_IG26: +G_M13892_IG25: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 -G_M13892_IG27: +G_M13892_IG26: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 -; Total bytes of code 1889, prolog size 19, PerfScore 1268.13, instruction count 424, allocated bytes for code 1967 (MethodHash=c175c9bb) for method System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) +; Total bytes of code 1686, prolog size 19, PerfScore 1667.13, instruction count 385, allocated bytes for code 1686 (MethodHash=c175c9bb) for method System.SpanHelpers:IndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) ```
-198 (-10.55 % of base) - System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int ```diff ; Assembly listing for method System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX512 - Unix ; FullOpts code ; optimized code ; rbp based frame ; fully interruptible ; No PGO data ; 0 inlinees with PGO data; 0 single block inlinees; 39 inlinees without PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T01] ( 15, 38 ) byref -> rdi single-def ; V01 arg1 [V01,T08] ( 19, 41.50) simd32 -> mm2 ld-addr-op single-def ; V02 arg2 [V02,T04] ( 19, 49 ) simd32 -> mm0 ld-addr-op single-def ; V03 arg3 [V03,T05] ( 19, 49 ) simd32 -> mm1 ld-addr-op single-def ; V04 arg4 [V04,T00] ( 33, 64.50) int -> rsi ; V05 loc0 [V05,T03] ( 78,216 ) simd32 -> mm4 ld-addr-op ;* V06 loc1 [V06 ] ( 0, 0 ) simd32 -> zero-ref ld-addr-op ;# V07 OutArgs [V07 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ;* V08 tmp1 [V08 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V09 tmp2 [V09 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V10 tmp3 [V10 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V11 tmp4 [V11 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V12 tmp5 [V12 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V13 tmp6 [V13 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V14 tmp7 [V14 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V15 tmp8 [V15 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V16 tmp9 [V16 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V17 tmp10 [V17 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V18 tmp11 [V18 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V19 tmp12 [V19 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V20 tmp13 [V20 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V21 tmp14 [V21 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V22 tmp15 [V22 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V23 tmp16 [V23 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V24 tmp17 [V24 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V25 tmp18 [V25 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V26 tmp19 [V26 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V27 tmp20 [V27 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V28 tmp21 [V28 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V29 tmp22 [V29 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V30 tmp23 [V30 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V31 tmp24 [V31 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V32 tmp25 [V32 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V33 tmp26 [V33 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V34 tmp27 [V34 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V35 tmp28 [V35 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V36 tmp29 [V36 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V37 tmp30 [V37 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V38 tmp31 [V38 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V39 tmp32 [V39 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V40 tmp33 [V40 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V41 tmp34 [V41 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V42 tmp35 [V42 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V43 tmp36 [V43 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V44 tmp37 [V44 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V45 tmp38 [V45 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V46 tmp39 [V46 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V47 tmp40 [V47 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V48 tmp41 [V48 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V49 tmp42 [V49 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V50 tmp43 [V50 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V51 tmp44 [V51 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V52 tmp45 [V52 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V53 tmp46 [V53 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V54 tmp47 [V54 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V55 tmp48 [V55 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V56 tmp49 [V56 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V57 tmp50 [V57 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V58 tmp51 [V58 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V59 tmp52 [V59 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V60 tmp53 [V60 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V61 tmp54 [V61 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V62 tmp55 [V62 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V63 tmp56 [V63 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V64 tmp57 [V64 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V65 tmp58 [V65 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V66 tmp59 [V66 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V67 tmp60 [V67 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V68 tmp61 [V68 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V69 tmp62 [V69 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V70 tmp63 [V70 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V71 tmp64 [V71 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V72 tmp65 [V72 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V73 tmp66 [V73 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V74 tmp67 [V74 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V75 tmp68 [V75 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V76 tmp69 [V76 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V77 tmp70 [V77 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V78 tmp71 [V78 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V79 tmp72 [V79 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V80 tmp73 [V80 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V81 tmp74 [V81 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V82 tmp75 [V82 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V83 tmp76 [V83 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ;* V84 tmp77 [V84 ] ( 0, 0 ) ubyte -> zero-ref "Inline return value spill temp" ;* V85 tmp78 [V85 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg" ; V86 tmp79 [V86,T02] ( 9, 9 ) int -> rax "Single return block return value" ; V87 cse0 [V87,T09] ( 16, 38.75) simd32 -> mm3 hoist multi-def "CSE #01: aggressive" ; V88 cse1 [V88,T06] ( 16, 42.50) simd32 -> mm6 multi-def "CSE #03: aggressive" ; V89 cse2 [V89,T07] ( 16, 42.50) simd32 -> mm7 multi-def "CSE #05: aggressive" ; V90 cse3 [V90,T10] ( 4, 16 ) simd32 -> mm5 "CSE #02: aggressive" ; V91 cse4 [V91,T11] ( 4, 16 ) simd32 -> mm5 "CSE #04: aggressive" ; V92 cse5 [V92,T12] ( 4, 16 ) simd32 -> mm5 "CSE #06: aggressive" ; V93 cse6 [V93,T13] ( 4, 16 ) simd32 -> mm5 "CSE #07: aggressive" ; V94 cse7 [V94,T14] ( 4, 16 ) simd32 -> mm5 "CSE #08: aggressive" ; V95 cse8 [V95,T15] ( 4, 16 ) simd32 -> mm5 "CSE #09: aggressive" ; V96 cse9 [V96,T16] ( 4, 16 ) simd32 -> mm5 "CSE #10: aggressive" ; V97 cse10 [V97,T17] ( 4, 16 ) simd32 -> mm5 "CSE #11: aggressive" ; V98 cse11 [V98,T18] ( 4, 8 ) simd32 -> mm5 "CSE #16: moderate" ; V99 cse12 [V99,T19] ( 4, 2 ) simd32 -> mm5 "CSE #12: conservative" ; V100 cse13 [V100,T20] ( 4, 2 ) simd32 -> mm5 "CSE #13: conservative" ; V101 cse14 [V101,T21] ( 4, 2 ) simd32 -> mm5 "CSE #14: conservative" ; V102 cse15 [V102,T22] ( 4, 2 ) simd32 -> mm5 "CSE #15: conservative" ; ; Lcl frame size = 0 G_M12558_IG01: push rbp mov rbp, rsp vmovups ymm2, ymmword ptr [rbp+0x10] vmovups ymm0, ymmword ptr [rbp+0x30] vmovups ymm1, ymmword ptr [rbp+0x50] ;; size=19 bbWeight=1 PerfScore 13.25 G_M12558_IG02: cmp esi, 8 - jl G_M12558_IG22 + jl G_M12558_IG21 ;; size=9 bbWeight=1 PerfScore 1.25 G_M12558_IG03: vcmpps ymm3, ymm2, ymm2, 0 align [0 bytes for IG04] ;; size=5 bbWeight=0.25 PerfScore 0.75 G_M12558_IG04: add esi, -8 lea eax, [rsi+0x07] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm6, ymm5, ymm3, 17 vcmpps ymm7, ymm4, ymm2, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb G_M12558_IG07 + vptest ymm6, ymm7 + jb G_M12558_IG06 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm7, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm7, ymm7, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm7 - kortestb k1, k1 - jb G_M12558_IG07 + vptest ymm7, ymm8 + jb G_M12558_IG06 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M12558_IG07 + vptest ymm4, ymm5 + jb G_M12558_IG06 lea eax, [rsi+0x06] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - ;; size=273 bbWeight=4 PerfScore 266.00 + je SHORT G_M12558_IG07 + ;; size=245 bbWeight=4 PerfScore 354.00 G_M12558_IG05: - je SHORT G_M12558_IG08 - ;; size=2 bbWeight=4 PerfScore 4.00 -G_M12558_IG06: lea eax, [rsi+0x06] - jmp G_M12558_IG20 - align [0 bytes for IG25] + jmp G_M12558_IG19 + align [0 bytes for IG23] ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG07: +G_M12558_IG06: lea eax, [rsi+0x07] - jmp G_M12558_IG20 + jmp G_M12558_IG19 ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG08: +G_M12558_IG07: lea eax, [rsi+0x05] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG10 - ;; size=133 bbWeight=4 PerfScore 126.00 -G_M12558_IG09: + je SHORT G_M12558_IG09 + ;; size=118 bbWeight=4 PerfScore 168.00 +G_M12558_IG08: lea eax, [rsi+0x05] - jmp G_M12558_IG20 + jmp G_M12558_IG19 ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG10: +G_M12558_IG09: lea eax, [rsi+0x04] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG12 - ;; size=133 bbWeight=4 PerfScore 126.00 -G_M12558_IG11: + je SHORT G_M12558_IG11 + ;; size=118 bbWeight=4 PerfScore 168.00 +G_M12558_IG10: lea eax, [rsi+0x04] - jmp G_M12558_IG20 + jmp G_M12558_IG19 ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG12: +G_M12558_IG11: lea eax, [rsi+0x03] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG14 - ;; size=133 bbWeight=4 PerfScore 126.00 -G_M12558_IG13: + je SHORT G_M12558_IG13 + ;; size=118 bbWeight=4 PerfScore 168.00 +G_M12558_IG12: lea eax, [rsi+0x03] - jmp G_M12558_IG20 + jmp G_M12558_IG19 ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG14: +G_M12558_IG13: lea eax, [rsi+0x02] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG16 - ;; size=133 bbWeight=4 PerfScore 126.00 -G_M12558_IG15: + je SHORT G_M12558_IG15 + ;; size=118 bbWeight=4 PerfScore 168.00 +G_M12558_IG14: lea eax, [rsi+0x02] - jmp G_M12558_IG20 + jmp G_M12558_IG19 ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG16: +G_M12558_IG15: lea eax, [rsi+0x01] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG18 - ;; size=133 bbWeight=4 PerfScore 126.00 -G_M12558_IG17: + je SHORT G_M12558_IG17 + ;; size=118 bbWeight=4 PerfScore 168.00 +G_M12558_IG16: lea eax, [rsi+0x01] - jmp G_M12558_IG20 - ;; size=8 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG18: + jmp SHORT G_M12558_IG19 + ;; size=5 bbWeight=0.50 PerfScore 1.25 +G_M12558_IG17: movsxd rax, esi shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 + vptest ymm8, ymm9 setb al movzx rax, al vpternlogd ymm6, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm6, ymm6, ymm8 - vpcmpeqd k1, ymm9, ymm6 - kortestb k1, k1 + vptest ymm6, ymm9 setb cl movzx rcx, cl or eax, ecx vpternlogd ymm7, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm7, ymm4 - vpcmpeqd k1, ymm9, ymm4 - kortestb k1, k1 + vptest ymm4, ymm9 setb cl movzx rcx, cl or eax, ecx - je SHORT G_M12558_IG21 - ;; size=126 bbWeight=4 PerfScore 122.00 -G_M12558_IG19: + je SHORT G_M12558_IG20 + ;; size=111 bbWeight=4 PerfScore 164.00 +G_M12558_IG18: mov eax, esi ;; size=2 bbWeight=0.50 PerfScore 0.12 -G_M12558_IG20: +G_M12558_IG19: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG21: +G_M12558_IG20: cmp esi, 8 jge G_M12558_IG04 ;; size=9 bbWeight=4 PerfScore 5.00 -G_M12558_IG22: +G_M12558_IG21: cmp esi, 4 - jl G_M12558_IG25 + jl G_M12558_IG23 add esi, -4 lea eax, [rsi+0x03] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm3, ymm2, ymm2, 0 vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm6, ymm5, ymm3, 17 vcmpps ymm7, ymm4, ymm2, 0 vorps ymm6, ymm6, ymm7 vpcmpeqd ymm7, ymm7, ymm7 - vpcmpeqd k1, ymm7, ymm6 - kortestb k1, k1 - jb G_M12558_IG13 + vptest ymm6, ymm7 + jb G_M12558_IG12 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm7, ymm5, ymm6, 17 vcmpps ymm8, ymm4, ymm0, 0 vorps ymm7, ymm7, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm7 - kortestb k1, k1 - jb G_M12558_IG13 + vptest ymm7, ymm8 + jb G_M12558_IG12 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M12558_IG13 + vptest ymm4, ymm5 + jb G_M12558_IG12 lea eax, [rsi+0x02] cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 - jb G_M12558_IG15 + vptest ymm8, ymm9 + jb G_M12558_IG14 vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 - jb G_M12558_IG15 + vptest ymm8, ymm9 + jb G_M12558_IG14 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 - ;; size=267 bbWeight=0.50 PerfScore 33.50 -G_M12558_IG23: vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M12558_IG15 + vptest ymm4, ymm5 + jb G_M12558_IG14 lea eax, [rsi+0x01] + ;; size=260 bbWeight=0.50 PerfScore 45.75 +G_M12558_IG22: cdqe shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm8, ymm5, ymm3, 17 vcmpps ymm9, ymm4, ymm2, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 - jb G_M12558_IG17 + vptest ymm8, ymm9 + jb G_M12558_IG16 vpternlogd ymm8, ymm5, ymm6, 17 vcmpps ymm9, ymm4, ymm0, 0 vorps ymm8, ymm8, ymm9 vpcmpeqd ymm9, ymm9, ymm9 - vpcmpeqd k1, ymm9, ymm8 - kortestb k1, k1 - jb G_M12558_IG17 + vptest ymm8, ymm9 + jb G_M12558_IG16 vpternlogd ymm5, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm4, ymm5, ymm4 vpcmpeqd ymm5, ymm5, ymm5 - vpcmpeqd k1, ymm5, ymm4 - kortestb k1, k1 - jb G_M12558_IG17 + vptest ymm4, ymm5 + jb G_M12558_IG16 movsxd rax, esi shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm3, ymm5, ymm3, 17 vcmpps ymm8, ymm4, ymm2, 0 vorps ymm3, ymm3, ymm8 vpcmpeqd ymm8, ymm8, ymm8 - vpcmpeqd k1, ymm8, ymm3 - kortestb k1, k1 - jb G_M12558_IG19 + vptest ymm3, ymm8 + jb G_M12558_IG18 vpternlogd ymm6, ymm5, ymm6, 17 vcmpps ymm3, ymm4, ymm0, 0 vorps ymm3, ymm6, ymm3 vpcmpeqd ymm6, ymm6, ymm6 - vpcmpeqd k1, ymm6, ymm3 - kortestb k1, k1 - jb G_M12558_IG19 + vptest ymm3, ymm6 + jb G_M12558_IG18 vpternlogd ymm7, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm3, ymm7, ymm4 vpcmpeqd ymm4, ymm4, ymm4 - vpcmpeqd k1, ymm4, ymm3 - kortestb k1, k1 - ;; size=272 bbWeight=0.50 PerfScore 30.50 -G_M12558_IG24: - jb G_M12558_IG19 - ;; size=6 bbWeight=0.50 PerfScore 0.50 -G_M12558_IG25: + vptest ymm3, ymm4 + jb G_M12558_IG18 + ;; size=225 bbWeight=0.50 PerfScore 39.75 +G_M12558_IG23: test esi, esi - jg SHORT G_M12558_IG28 + jg SHORT G_M12558_IG26 ;; size=4 bbWeight=4 PerfScore 5.00 -G_M12558_IG26: +G_M12558_IG24: mov eax, -1 ;; size=5 bbWeight=0.50 PerfScore 0.12 -G_M12558_IG27: +G_M12558_IG25: vzeroupper pop rbp ret ;; size=5 bbWeight=0.50 PerfScore 1.25 -G_M12558_IG28: +G_M12558_IG26: dec esi movsxd rax, esi shl rax, 5 vmovups ymm4, ymmword ptr [rdi+rax] vcmpps ymm3, ymm2, ymm2, 0 vcmpps ymm5, ymm4, ymm4, 0 vpternlogd ymm3, ymm5, ymm3, 17 vcmpps ymm6, ymm4, ymm2, 0 vorps ymm3, ymm3, ymm6 vpcmpeqd ymm6, ymm6, ymm6 - vpcmpeqd k1, ymm6, ymm3 - kortestb k1, k1 - jb G_M12558_IG19 + vptest ymm3, ymm6 + jb G_M12558_IG18 vcmpps ymm6, ymm0, ymm0, 0 vpternlogd ymm3, ymm5, ymm6, 17 vcmpps ymm6, ymm4, ymm0, 0 vorps ymm3, ymm3, ymm6 vpcmpeqd ymm6, ymm6, ymm6 - vpcmpeqd k1, ymm6, ymm3 - kortestb k1, k1 - jb G_M12558_IG19 + vptest ymm3, ymm6 + jb G_M12558_IG18 vcmpps ymm7, ymm1, ymm1, 0 vpternlogd ymm3, ymm5, ymm7, 17 vcmpps ymm4, ymm4, ymm1, 0 vorps ymm3, ymm3, ymm4 vpcmpeqd ymm4, ymm4, ymm4 - vpcmpeqd k1, ymm4, ymm3 - kortestb k1, k1 - jb G_M12558_IG19 - jmp G_M12558_IG25 - ;; size=147 bbWeight=2 PerfScore 81.00 + vptest ymm3, ymm4 + jb G_M12558_IG18 + jmp G_M12558_IG23 + ;; size=132 bbWeight=2 PerfScore 102.00 -; Total bytes of code 1877, prolog size 19, PerfScore 1204.25, instruction count 419, allocated bytes for code 1955 (MethodHash=f075cef1) for method System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) +; Total bytes of code 1679, prolog size 19, PerfScore 1582.25, instruction count 380, allocated bytes for code 1679 (MethodHash=f075cef1) for method System.SpanHelpers:LastIndexOfAny[System.Numerics.Vector`1[float]](byref,System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],System.Numerics.Vector`1[float],int):int (FullOpts) ```

Larger list of diffs: https://gist.github.com/MihuBot/19da0c8a13430beaae661d050a25685a

MihuBot commented 2 months ago

@xtqqczze