dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.37k stars 189 forks source link

[NativeAOT-LLVM] Implement implicit shadow tail calls #2501

Closed SingleAccretion closed 5 months ago

SingleAccretion commented 5 months ago

Closes #2213.

Calls in a tail position in methods without implicitly live shadow state can be tail-called, i. e. called on the caller's shadow frame. This is both a code size and performance optimization, the latter because GC needs to do less work.

Diffs:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 3214419
Total bytes of diff: 3197845
Total bytes of delta: -16574 (-0.52% % of base)
Average relative delta: -3.99%
    diff is an improvement
    average relative diff is an improvement

Top method regressions (percentages):
           4 ( 5.06% of base) : 1832.dasm - S_P_CoreLib_System_Reflection_Runtime_Assemblies_NativeFormat_NativeFormatRuntimeAssembly__Equals
           2 ( 4.76% of base) : 3477.dasm - S_P_CoreLib_Interop___GetExceptionForIoErrno_g__ParentDirectoryExists_18_0
          12 ( 4.04% of base) : 2372.dasm - S_P_CoreLib_System_Number__Int32ToHexStr
          12 ( 3.45% of base) : 1022.dasm - S_P_CoreLib_Internal_Runtime_CompilerHelpers_StartupCodeHelpers__RunModuleInitializers
          14 ( 2.60% of base) : 2375.dasm - S_P_CoreLib_System_Number__UInt32ToDecStr_NoSmallNumberCheck
           8 ( 2.56% of base) : 2377.dasm - S_P_CoreLib_System_Number__TryInt32ToHexStr<Char>
          14 ( 2.53% of base) : 2371.dasm - S_P_CoreLib_System_Number__UInt32ToDecStr_0
          14 ( 1.83% of base) : 2370.dasm - S_P_CoreLib_System_Number__NegativeInt32ToDecStr
          12 ( 1.82% of base) : 1034.dasm - S_P_CoreLib_System_Runtime_EH__DispatchException
           2 ( 1.65% of base) : 2783.dasm - S_P_CoreLib_System_RuntimeType__MakeArrayType_0
           2 ( 1.63% of base) : 1262.dasm - S_P_CoreLib_System_Globalization_PersianCalendar__GetDaysInYear
          14 ( 1.61% of base) : 2376.dasm - S_P_CoreLib_System_Number__TryUInt32ToDecStr_0<Char>
           3 ( 1.49% of base) : 2345.dasm - S_P_CoreLib_System_ReadOnlySpan_1<Char>__ToString
           4 ( 1.42% of base) : 3063.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_LowLevelListExtensions__Expand<Bool>
           4 ( 1.37% of base) : 3352.dasm - S_P_CoreLib_System_Decimal__ToString
           2 ( 0.97% of base) : 3473.dasm - S_P_CoreLib_System_Reflection_TypeNameParser__StartAssemblyName
           3 ( 0.89% of base) : 1696.dasm - S_P_TypeLoader_Internal_TypeSystem_ExceptionTypeNameFormatter__GetTypeNamespace
           3 ( 0.89% of base) : 1694.dasm - S_P_TypeLoader_Internal_TypeSystem_ExceptionTypeNameFormatter__GetTypeName
           2 ( 0.76% of base) : 2756.dasm - S_P_CoreLib_System_Comparison_1<S_P_StackTraceMetadata_Internal_StackTraceMetadata_StackTraceMetadata_PerModuleMethodNameResolver_StackTraceData>__InvokeOpenInstanceThunk
           4 ( 0.72% of base) : 3254.dasm - S_P_CoreLib_System_Globalization_NumberFormatInfo___GetInstance_g__GetProviderNonNull_59_0

Top method improvements (percentages):
         -22 (-26.51% of base) : 2925.dasm - S_P_CoreLib_System_Reflection_SignatureConstructedGenericType__GetGenericArguments
         -17 (-18.09% of base) : 1488.dasm - S_P_CoreLib_System_Collections_Generic_RandomizedStringEqualityComparer_OrdinalIgnoreCaseComparer__Equals
         -22 (-17.60% of base) : 3019.dasm - S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext_ArrayTypeKey_ArrayTypeKeyHashtable__GetValueHashCode
         -15 (-16.30% of base) : 3447.dasm - S_P_CoreLib_Internal_Runtime_Augments_RuntimeAugments__GetArrayRankOrMinusOneForSzArray
          -7 (-15.91% of base) : 2113.dasm - S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__IntPtr>__System_Collections_IEnumerable_GetEnumerator
          -7 (-15.91% of base) : 3730.dasm - S_P_CoreLib_System_Collections_Generic_NonRandomizedStringEqualityComparer_OrdinalIgnoreCaseComparer__GetHashCode
          -7 (-15.91% of base) : 3327.dasm - S_P_CoreLib_System_Collections_Generic_Dictionary_2<S_P_CoreLib_System_Collections_Generic_KeyValuePair_2<System___Canon__System___Canon>__System___Canon>__System_Collections_IEnumerable_GetEnumerator
          -7 (-15.91% of base) : 1141.dasm - S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>__System_Collections_IEnumerable_GetEnumerator
          -7 (-15.91% of base) : 3744.dasm - S_P_CoreLib_System_Collections_Generic_NonRandomizedStringEqualityComparer_OrdinalComparer__GetHashCode
          -7 (-15.91% of base) : 1368.dasm - S_P_CoreLib_System_Runtime_CompilerServices_ConditionalWeakTable_2<System___Canon__System___Canon>__System_Collections_IEnumerable_GetEnumerator
          -7 (-15.91% of base) : 2472.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment_DynamicGenericMethodsHashtable__GetValueHashCode
          -7 (-14.00% of base) : 3261.dasm - S_P_CoreLib_System_Environment__GetEnvironmentVariable
          -3 (-13.64% of base) : 3881.dasm - S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeArrayTypeInfo__get_SyntheticConstructors_d__18__System_Collections_IEnumerable_GetEnumerator
          -3 (-13.64% of base) : 2548.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__AsEnumerable_d__33__System_Collections_IEnumerable_GetEnumerator
          -3 (-13.64% of base) : 2420.dasm - S_P_CoreLib_System_Security_SecurityException__ToString
          -3 (-13.64% of base) : 1189.dasm - S_P_CoreLib_System_Reflection_Runtime_CustomAttributes_RuntimeCustomAttributeData__GetCustomAttributes_d__16__System_Collections_IEnumerable_GetEnumerator
          -3 (-13.64% of base) : 2482.dasm - S_P_CoreLib_System_Reflection_Runtime_General_NativeFormatMetadataReaderExtensions__GetTransitiveNamespaces_d__28__System_Collections_IEnumerable_GetEnumerator
          -3 (-13.64% of base) : 4362.dasm - S_P_CoreLib_System_Reflection_Runtime_FieldInfos_RuntimeFieldInfo__get_CustomAttributes_d__2__System_Collections_IEnumerable_GetEnumerator
          -3 (-13.64% of base) : 1203.dasm - S_P_CoreLib_System_Reflection_Runtime_PropertyInfos_RuntimePropertyInfo__get_SetMethod
          -3 (-13.64% of base) : 4045.dasm - S_P_CoreLib_System_Runtime_Loader_LibraryNameVariation__DetermineLibraryNameVariations_d__5__System_Collections_IEnumerable_GetEnumerator

3465 total methods with Code Size differences (3421 improved, 44 regressed)
SingleAccretion commented 5 months ago

@dotnet/nativeaot-llvm

SingleAccretion commented 5 months ago

Can we mark the call as LLVM tailcc convention?

No, it's not supported for WASM. This convention is used on other platforms for "must-tail" tail calls, in .NET parlance that would be the .tail prefix (in Jit parlance these are "explicit" tail calls), tail calls that must be tail called for correctness purposes.

WASM has an extension for tail calls which enables this "must-tail" functionality, but it's not part of the default set.

You can mark a call in LLVM as simply "tail", which is more akin to the implicit tail calls in the Jit, but that too requires the tail call extension, and would result in actual, physical, tail call - something we would need stronger checking for (and, ideally, something integrated with the frontend).

yowl commented 5 months ago

Reading https://llvm.org/docs/CodeGenerator.html#tail-call-optimization, there appears to be some support in LLVM, is it the constraints we dont meet?

SingleAccretion commented 5 months ago

there appears to be some support in LLVM, is it the constraints we dont meet?

We do meet them (or can check for them), the problem is The ‘tail-call’ target attribute is enabled: the support is spotty at best at the moment:

image

And it's not a critical feature to have outside of the .tail support.