dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.37k stars 189 forks source link

[NativeAOT-LLVM] Zero shadow locals using a single `memset` #2498

Closed SingleAccretion closed 5 months ago

SingleAccretion commented 5 months ago

Group them together so it's possible to do so. Gives up on "ideal" layout in the present of align-8 locals, but it shouldn't be a big deal.

The diffs are kind of mixed due to an upstream LLVM issue (https://github.com/llvm/llvm-project/issues/79692):

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 3220731
Total bytes of diff: 3209366
Total bytes of delta: -11365 (-0.35% % of base)
Average relative delta: -0.70%
    diff is an improvement
    average relative diff is an improvement

Top method regressions (percentages):
         128 (30.26% of base) : 1017.dasm - S_P_CoreLib_System_Buffers_SharedArrayPool_1<Int32>__InitializeTlsBucketsAndTrimming
         128 (30.26% of base) : 1020.dasm - S_P_CoreLib_System_Buffers_SharedArrayPool_1<UInt8>__InitializeTlsBucketsAndTrimming
         128 (30.26% of base) : 1019.dasm - S_P_CoreLib_System_Buffers_SharedArrayPool_1<Char>__InitializeTlsBucketsAndTrimming
         252 (15.59% of base) : 1417.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__GVMLookupForSlotWorker
          35 (10.87% of base) : 1983.dasm - S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext_GenericTypeInstanceKey__Equals_0
          14 (10.45% of base) : 1676.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_NamespaceDefinitionHandle__Equals
          14 (10.45% of base) : 1838.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ConstantStringValueHandle__Equals
          14 (10.45% of base) : 1832.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ScopeDefinitionHandle__Equals
          14 (10.45% of base) : 1840.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_MethodSignatureHandle__Equals
          14 (10.45% of base) : 1842.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ConstantStringArrayHandle__Equals
          14 (10.45% of base) : 1083.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_GenericParameterHandle__Equals
          14 (10.45% of base) : 1085.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_TypeDefinitionHandle__Equals
           7 ( 8.43% of base) : 2278.dasm - Single__ToString
           7 ( 8.43% of base) : 2277.dasm - Double__ToString
          12 ( 8.00% of base) : 2137.dasm - S_P_CoreLib_System_Reflection_CustomAttributeData__get_AttributeType
          12 ( 6.70% of base) : 1390.dasm - S_P_CoreLib_System_BadImageFormatException__SetMessageField
          32 ( 6.37% of base) : 1985.dasm - S_P_TypeLoader_System_Collections_Generic_LowLevelDictionary_2<S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext_GenericTypeInstanceKey__System___Canon>__Find
          11 ( 6.36% of base) : 2553.dasm - S_P_TypeLoader_Internal_TypeSystem_ThrowHelper_Format__OwningModule_0
          10 ( 5.21% of base) : 1817.dasm - S_P_CoreLib_System_Globalization_CalendarData___c___GetCalendarInfo_b__34_0
          17 ( 5.04% of base) : 1246.dasm - S_P_CoreLib_System_Globalization_GlobalizationMode__LoadAppLocalIcu

Top method improvements (percentages):
         -68 (-17.48% of base) : 1207.dasm - S_P_Reflection_Execution_Internal_Reflection_Execution_MethodInvokers_MethodInvokerWithMethodInvokeInfo__CreateMethodInvoker
         -70 (-16.28% of base) : 2452.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_RuntimePlainConstructorInfo_1<S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon>__get_MetadataDefinitionMethod
         -95 (-15.91% of base) : 2130.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__GetGenericTypeParametersWithSpecifiedOwningMethod
        -142 (-15.24% of base) : 1893.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__IsCustomAttributeOfType
        -122 (-14.93% of base) : 2125.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_QualifiedMethodSignature
         -86 (-14.43% of base) : 1434.dasm - S_P_CoreLib_System_Number__FormatDecimal
         -69 (-13.27% of base) : 1398.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_RuntimeMethodCommonOfUninstantiatedMethod
         -69 (-13.27% of base) : 2019.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_MetadataNameExtensions__GetFullName_5
         -92 (-13.20% of base) : 1876.dasm - String__SplitInternal
        -110 (-12.72% of base) : 1670.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo_UnificationKey__System___Canon>__Add
         -64 (-12.62% of base) : 1035.dasm - S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo__get_TypeRefDefOrSpecsForDirectlyImplementedInterfaces
         -89 (-12.21% of base) : 2084.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<IntPtr__System___Canon>__Add
         -84 (-12.12% of base) : 1640.dasm - S_P_TypeLoader_Internal_TypeSystem_NoMetadata_RuntimeMethodDesc__GetCanonMethodTarget
         -99 (-11.90% of base) : 2300.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_Assemblies_NativeFormat_NativeFormatRuntimeAssembly_RuntimeAssemblyKey__System___Canon>__Add
        -127 (-11.87% of base) : 1593.dasm - S_P_CoreLib_System_Number___FormatUInt64_g__FormatUInt64Slow_47_0
        -101 (-11.62% of base) : 2537.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeGenericParameterTypeInfoForTypes_UnificationKey__System___Canon>__Add
         -99 (-11.45% of base) : 1234.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NamespaceChain___ctor
        -136 (-11.30% of base) : 1386.dasm - S_P_CoreLib_System_Number___FormatInt64_g__FormatInt64Slow_45_0
         -81 (-11.01% of base) : 2348.dasm - S_P_CoreLib_System_TimeZoneInfo__PopulateDisplayName
         -84 (-10.97% of base) : 1343.dasm - S_P_CoreLib_System_Enum__TryFormatPrimitiveDefault<Int8__UInt8>

1595 total methods with Code Size differences (844 improved, 751 regressed)

But overall still an improvement. More importantly, debug code sees nice wins:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16135112
Total bytes of diff: 15664183
Total bytes of delta: -470929 (-2.92% % of base)
Average relative delta: -3.56%
    diff is an improvement
    average relative diff is an improvement

Top method regressions (percentages):
          69 ( 6.21% of base) : 2291.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__ToArray_3
          32 ( 4.01% of base) : 4489.dasm - S_P_CoreLib_Interop_Sys__GetEnv
          32 ( 4.01% of base) : 1857.dasm - S_P_CoreLib_Interop_Globalization__LoadICUData
          32 ( 4.01% of base) : 2760.dasm - S_P_CoreLib_Interop_Sys__LoadLibrary
          58 ( 3.74% of base) : 5740.dasm - S_P_CoreLib_System_Runtime_CompilerServices_CastCache__CreateCastCache
          16 ( 3.69% of base) : 3026.dasm - HelloWasm_Program__TestVirtualUnwindStackNoPopOnNestedUnwindingFault
          12 ( 3.68% of base) : 2429.dasm - HelloWasm_Program___TestClippingUncontainedNestedDispatchIntraFrame_g__TestClippingUncontainedNestedDispatchIntrarame_NestedThrow_273_1
          19 ( 3.68% of base) : 2403.dasm - HelloWasm_Program__TestDeepUncontainedNestedDispatchSingleFrame
          53 ( 3.63% of base) : 1687.dasm - S_P_CoreLib_System_Globalization_CultureInfo__GetCultureInfo_0
          20 ( 3.50% of base) : 2409.dasm - HelloWasm_Program__TestExactUncontainedNestedDispatchSingleFrame
          31 ( 3.40% of base) : 6284.dasm - S_P_CoreLib_Interop_Sys__OpenDir
          31 ( 3.40% of base) : 2251.dasm - S_P_CoreLib_Interop_Sys__Unlink
          32 ( 3.38% of base) : 2387.dasm - HelloWasm_Program__TestFilter
          20 ( 3.33% of base) : 4390.dasm - HelloWasm_Program_DerivedCatches_1<System___Canon>__GvmInFilter<System___Canon>
          12 ( 3.31% of base) : 1828.dasm - S_P_CoreLib_System_Threading_WaitSubsystem__NewHandle
          15 ( 3.30% of base) : 4759.dasm - S_P_CoreLib_System_Reflection_SignatureTypeExtensions__TryMakeByRefType
          15 ( 3.30% of base) : 4758.dasm - S_P_CoreLib_System_Reflection_SignatureTypeExtensions__TryMakePointerType
          15 ( 3.30% of base) : 4761.dasm - S_P_CoreLib_System_Reflection_SignatureTypeExtensions__TryMakeArrayType
           7 ( 3.27% of base) : 5931.dasm - S_P_CoreLib_System_Runtime_RuntimeImports__RhpGetTickCount64
           7 ( 3.27% of base) : 5151.dasm - S_P_CoreLib_Interop_Sys__GetTimestamp

Top method improvements (percentages):
        -810 (-34.72% of base) : 3539.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_RuntimeMethodCommonOfUninstantiatedMethod
        -570 (-32.95% of base) : 5942.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_MethodHandle__GetMethod
        -440 (-32.74% of base) : 1326.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_MetadataNameExtensions__GetFullName_13
        -497 (-32.68% of base) : 5688.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__ParseMethodSignature
        -520 (-32.22% of base) : 5941.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_MethodSignatureHandle__GetMethodSignature
        -343 (-31.91% of base) : 3588.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__IsConstructor
        -426 (-31.77% of base) : 3520.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_GenericParameterCount
        -490 (-31.45% of base) : 5934.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_GenericParameterHandle__GetGenericParameter
        -496 (-31.39% of base) : 6085.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ScopeReferenceHandle__GetScopeReference
        -504 (-30.98% of base) : 1872.dasm - S_P_StackTraceMetadata_Internal_StackTraceMetadata_MethodNameFormatter_SigTypeContext__FromMethod_0
        -534 (-30.72% of base) : 2675.dasm - S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo__get_IsGenericTypeDefinition
        -607 (-30.53% of base) : 4049.dasm - S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo__GetGenericTypeDefinition
        -442 (-30.13% of base) : 5655.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_PropertyHandle__GetProperty
        -412 (-30.12% of base) : 3766.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_FieldHandle__GetField
        -442 (-30.07% of base) : 3838.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_NamespaceDefinitionHandle__GetNamespaceDefinition
        -412 (-30.05% of base) : 1655.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_EventHandle__GetEvent
        -492 (-29.91% of base) : 6009.dasm - S_P_Reflection_Execution_Internal_NativeFormat_NativeHashtable__EnumerateAllEntries
         -11 (-29.73% of base) : 1825.dasm - S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait$F1_Finally
         -11 (-29.73% of base) : 5872.dasm - S_P_CoreLib_System_Threading_WaitSubsystem__SetEvent_0$F1_Finally
         -11 (-29.73% of base) : 4454.dasm - S_P_CoreLib_System_Threading_SyncTable__SetHashCode$F1_Finally

5320 total methods with Code Size differences (4438 improved, 882 regressed)

For reference, here are the diffs with a workaround for that LLVM issue:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 3220731
Total bytes of diff: 3206861
Total bytes of delta: -13870 (-0.43% % of base)
Average relative delta: -1.36%
    diff is an improvement
    average relative diff is an improvement

Top method regressions (percentages):
          14 (10.45% of base) : 1069.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_GenericParameterHandle__Equals
          14 (10.45% of base) : 1071.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_TypeDefinitionHandle__Equals
          14 (10.45% of base) : 1665.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ConstantStringArrayHandle__Equals
          14 (10.45% of base) : 1538.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_NamespaceDefinitionHandle__Equals
          14 (10.45% of base) : 1658.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ScopeDefinitionHandle__Equals
          14 (10.45% of base) : 1663.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_MethodSignatureHandle__Equals
          14 (10.45% of base) : 1661.dasm - S_P_CoreLib_Internal_Metadata_NativeFormat_ConstantStringValueHandle__Equals
           7 ( 8.43% of base) : 2036.dasm - Double__ToString
           7 ( 8.43% of base) : 2037.dasm - Single__ToString
          39 ( 8.14% of base) : 2046.dasm - S_P_CoreLib_System_Text_DecoderExceptionFallbackBuffer__Throw
          12 ( 8.00% of base) : 1922.dasm - S_P_CoreLib_System_Reflection_CustomAttributeData__get_AttributeType
          22 ( 6.83% of base) : 1791.dasm - S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext_GenericTypeInstanceKey__Equals_0
          10 ( 5.43% of base) : 1836.dasm - S_P_CoreLib_System_TimeZoneInfo__TZif_ToInt32
           7 ( 5.22% of base) : 2068.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_CustomMethodInvoker__CreateInstance
          10 ( 5.21% of base) : 1647.dasm - S_P_CoreLib_System_Globalization_CalendarData___c___GetCalendarInfo_b__34_0
           9 ( 5.20% of base) : 2252.dasm - S_P_TypeLoader_Internal_TypeSystem_ThrowHelper_Format__OwningModule_0
           6 ( 4.92% of base) : 1182.dasm - S_P_CoreLib_System_Text_Encoding__GetCharCountWithFallback
           6 ( 4.76% of base) : 2137.dasm - S_P_CoreLib_System_IO_Path__IsPathFullyQualified
          24 ( 4.42% of base) : 1502.dasm - S_P_TypeLoader_Internal_TypeSystem_InstantiatedMethod__get_NameAndSignature
          24 ( 4.42% of base) : 1704.dasm - S_P_TypeLoader_Internal_TypeSystem_MethodForInstantiatedType__get_NameAndSignature

Top method improvements (percentages):
        -117 (-25.27% of base) : 1652.dasm - S_P_Reflection_Execution_Internal_Reflection_Execution_ExecutionEnvironmentImplementation__GetEnumInfo
         -96 (-18.46% of base) : 1312.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_RuntimeMethodCommonOfUninstantiatedMethod
         -68 (-17.48% of base) : 1170.dasm - S_P_Reflection_Execution_Internal_Reflection_Execution_MethodInvokers_MethodInvokerWithMethodInvokeInfo__CreateMethodInvoker
         -70 (-16.28% of base) : 2175.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_RuntimePlainConstructorInfo_1<S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon>__get_MetadataDefinitionMethod
         -95 (-15.91% of base) : 1915.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__GetGenericTypeParametersWithSpecifiedOwningMethod
        -142 (-15.24% of base) : 1711.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__IsCustomAttributeOfType
        -122 (-14.93% of base) : 1911.dasm - S_P_CoreLib_System_Reflection_Runtime_MethodInfos_NativeFormat_NativeFormatMethodCommon__get_QualifiedMethodSignature
         -86 (-14.43% of base) : 1342.dasm - S_P_CoreLib_System_Number__FormatDecimal
         -70 (-13.46% of base) : 1821.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_MetadataNameExtensions__GetFullName_5
         -92 (-13.20% of base) : 1695.dasm - String__SplitInternal
        -110 (-12.72% of base) : 1533.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo_UnificationKey__System___Canon>__Add
         -64 (-12.62% of base) : 1026.dasm - S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeNamedTypeInfo__get_TypeRefDefOrSpecsForDirectlyImplementedInterfaces
         -89 (-12.21% of base) : 1876.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<IntPtr__System___Canon>__Add
         -84 (-12.12% of base) : 1508.dasm - S_P_TypeLoader_Internal_TypeSystem_NoMetadata_RuntimeMethodDesc__GetCanonMethodTarget
         -99 (-11.90% of base) : 2055.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_Assemblies_NativeFormat_NativeFormatRuntimeAssembly_RuntimeAssemblyKey__System___Canon>__Add
        -127 (-11.87% of base) : 1471.dasm - S_P_CoreLib_System_Number___FormatUInt64_g__FormatUInt64Slow_47_0
         -87 (-11.82% of base) : 2097.dasm - S_P_CoreLib_System_TimeZoneInfo__PopulateDisplayName
         -87 (-11.80% of base) : 2259.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeBuilder__RegisterGenericTypesAndMethods
        -101 (-11.62% of base) : 2238.dasm - S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierW_2_Container<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_NativeFormat_NativeFormatRuntimeGenericParameterTypeInfoForTypes_UnificationKey__System___Canon>__Add
         -99 (-11.45% of base) : 1196.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NamespaceChain___ctor

1290 total methods with Code Size differences (891 improved, 399 regressed)

We can do the workaround for all memsets/memcpys though, so it doesn't make sense to add it just for the prolog zeroing.

SingleAccretion commented 5 months ago

@dotnet/nativeaot-llvm