[NativeAOT-LLVM] Add a couple dereferenceability attributes

SingleAccretion commented 3 months ago

For the shadow stack, return buffer and implicit byrefs.

Additionally, annotate the whole funclet tree with optnone/optsize, not just the root function.

The diffs are modest, and come from more usage of the WASM address modes:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 3059418
Total bytes of diff: 3058843
Total bytes of delta: -575 (-0.02% % of base)
Average relative delta: -1.26%
    diff is an improvement
    average relative diff is an improvement

Top method improvements (percentages):
         -92 (-5.47% of base) : 1005.dasm - S_P_StackTraceMetadata_Internal_StackTraceMetadata_MethodNameFormatter__EmitMethodParameters_0
         -62 (-3.21% of base) : 1011.dasm - System.Reflection.Runtime.General.NativeFormatMetadataReaderExtensions__TryParseConstantEnumArray
         -10 (-3.12% of base) : 1028.dasm - S_P_StackTraceMetadata_Internal_StackTraceMetadata_MethodNameFormatter_SigTypeContext__FromMethod
         -12 (-1.98% of base) : 1014.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeBuilderState__GetFieldGCLayout
         -28 (-1.92% of base) : 1024.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryGetMethodInvokeMetadataFromInvokeMap
         -13 (-1.65% of base) : 1012.dasm - System.Reflection.Runtime.MethodInfos.RuntimeMethodHelpers__GetRuntimeParameters<System.Reflection.Runtime.MethodInfos.NativeFormat.NativeFormatMethodCommon>
         -19 (-1.63% of base) : 1010.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TemplateLocator__TryGetTypeTemplate_Internal
         -21 (-1.52% of base) : 1002.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryGetMetadataForNamedType
         -20 (-1.46% of base) : 1008.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TemplateLocator__TryGetGenericMethodTemplate_Internal
         -21 (-1.42% of base) : 1009.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryLookupExactMethodPointer
         -19 (-1.35% of base) : 1001.dasm - S_P_StackTraceMetadata_Internal_StackTraceMetadata_StackTraceMetadata__GetMethodNameFromStartAddressIfAvailable
         -21 (-1.32% of base) : 1004.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryGetFieldAccessMetadataFromFieldAccessMap
         -12 (-1.18% of base) : 1019.dasm - System.TimeZoneInfo_AdjustmentRule__ValidateAdjustmentRule
         -15 (-1.16% of base) : 1003.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryGetStaticGenericTypeForComponents
         -21 (-1.14% of base) : 1006.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__ResolveInterfaceGenericVirtualMethodSlot
         -55 (-1.12% of base) : 1025.dasm - System.Reflection.CustomAttributeTypedArgument__ToString_0
         -33 (-1.11% of base) : 1020.dasm - System.Reflection.Runtime.Assemblies.NativeFormat.NativeFormatRuntimeAssembly__CreateCaseInsensitiveTypeDictionary
          -6 (-1.05% of base) : 1013.dasm - Internal.Metadata.NativeFormat.MetadataReader__GetNamespaceDefinition
         -21 (-0.97% of base) : 1007.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__ResolveGenericVirtualMethodTarget_0
         -22 (-0.76% of base) : 1030.dasm - S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderEnvironment__TryGetStaticGenericMethodComponents

31 total methods with Code Size differences (31 improved, 0 regressed)

- 41 04                      |       i32.const 4
- 6a                         |       i32.add
- 28 02 00                   |       i32.load 2 0
+ 28 02 04                   |       i32.load 2 4

SingleAccretion commented 3 months ago

@dotnet/nativeaot-llvm

SingleAccretion commented 3 months ago

Looks good, do the implicit byrefs cover some uses of class/struct fields ?

So in the implicit byref case, what we have is a struct argument that's accessed indirectly. Usually we can already prove this access 'in-bounds' with LEAs, but it is still beneficial to add the annotation because LLVM can otherwise struggle to lower stuff like:

*implicitByref = some-struct-value;
--> lowered by LLVM as:
*implicitByref = some-struct-value[0];
*(implicitByref + 8) = some-struct-value[1];

optimally due to https://github.com/llvm/llvm-project/issues/79692.

It also theoretically enables some more advanced optimizations like speculative stores, however, I did not see that in the diffs.

dotnet / runtimelab

[NativeAOT-LLVM] Add a couple dereferenceability attributes #2537