Detect safe/unsafe usage of GetCustomAttribute and similar methods

vitek-karas commented 4 years ago

If --used-attrs-only is specified linker will remove any attribute type (and usage) which is not referenced from some method body. This creates potentially problematic code patterns mainly because reflection APIs like GetCustomAttribute<T> will by default return any attributes of the specified type T and any derived type of T. In this case linker will correctly keep the T, but it will remove all of the derived attributes, which is potentially not what the calling code expects to happen.

We should implement pattern recognition for the calls to GetCustomAttribute<T> (and similar APIs as there are multiple APIs like this in the reflection) and detect cases where:

The T is determined and it's a sealed type -> Safe
Anything else -> Unsafe

The unsafe reporting should be done only if --used-attrs-only is specified. Without this, linker will by default keep all attribute types which are used anywhere, so the potentially problematic callsites are guaranteed to work correctly.

Sample test case showing how linker removes derived attributes:

    [SetupLinkerArgument ("--used-attrs-only", "true")]
    class UnusedDerivedAttributeType
    {
        static void Main ()
        {
            var tmp = new Bar ();
            var str = typeof (BaseAttribute).ToString ();
        }

        [Kept]
        [KeptMember (".ctor()")]
        [Derived]
        class Bar
        {
        }

        [Kept]
        [KeptBaseType (typeof (Attribute))]
        class BaseAttribute : Attribute
        {
        }

        class DerivedAttribute : BaseAttribute
        {
        }
    }

vitek-karas commented 4 years ago

There are 3 possible solutions with different level of "works by default" versus "size".

1. Just report safe/unsafe usage

This would only detect the already safe cases, that is calls to GetCustomAttributes (or similar APIs) with a sealed attribute type. Everything else would be reported as unsafe but nothing would be done about it. In the above example calling

typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as unsafe - warning

Potentially smallest size, but certain use cases are invalid and user would have to handle them manually.

2. Mark only detected used attributes

Linker would try to determine the type/method/field/property on which the GetCustomAttributes is called. If successful, it would look at the actual attribute instances on it and also look into it's base classes (if necessary as per the inherited parameter). It would only preserve the attributes which are present and are derived from the type of the attribute being asked for.

In the example above:

typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would determine that DerivedAttribute is used and it would mark it for inclusion

Type t = SomeCall(); // Linker can't analyze what t actually is
t.GetCustomAttributes<BaseAttribute>(); // Reported as unsafe, warning
t.GetCustomAttributes<DerivedAttribute>(); // Recognized as safe

Slightly larger size, but if linker can detect the items it is provably safe. If linker can't detect, still left to the user to fix. Almost all use cases in CoreFX/ASP.NET would not be analyzable as the item on which the attributes are applied is dynamic (linker can't figure it out).

3. mark all derived attribute types

Linker would simply mark the detected attribute type and all its derived types without additional analysis.

So in the above example

typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would mark DerivedAttribute

Type t = SomeCall(); // Linker can't analyze what t actually is
t.GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would mark the DerivedAttribute (without knowing if it's actually used on the type or no)
t.GetCustomAttributes<DerivedAttribute>(); // Recognized as safe

This would solve all cases where the type of the attribute is hardcoded (almost all use cases in CoreFX/ASP.NET are like that), regardless of the item the attributes are applied to. Downside is possible increase in size (hopefully not too much).

Proposed solution

I would personally prefer solution 3 - mark derived attribute types. It has the best chance of making the app work without issues. The size increase should be very small (in CoreFX/ASP.NET only a handful of attributes are not sealed and even less actually have derived types).

The solution could be improved by:

Not marking any additional attributes if the call is for GetCustomAttributes<Attribute>() and instead reporting this as unsafe. This is basically the same as GetCustomAttributes() which is inherently unsafe when linker trims unused attributes.
When marking derived attribute types, only mark those which are actually used as attribute instances anywhere. This might help with size a little bit by avoiding attributes which are not used at all. That said I don't think the size saving would be interesting (as noted, there are very few attributes with an inheritance hierarchy).

vitek-karas commented 4 years ago

/cc @MichalStrehovsky

vitek-karas commented 4 years ago

Discussed the proposed solutions with several people and the outcome is that we need more investigation.

So the solution for this issues should be basically # 1 from above - only detect the unsafe cases. That is detect cases where the attribute type is recognized and it's sealed and treat those as safe, and report everything else as unsafe.

The more advanced cases of class hierarchies in attribute type or being able to detect the item on which the attribute is applied are out of scope for this issue.

MichalStrehovsky commented 4 years ago

Is it possible to quickly find out how much --used-attrs-only is actually saving on non-Unity workloads? The pull request that introduced it (#238) doesn't unfortunately have data on that and it's quite possible Unity does have different needs than .NET Core.

The option introduces quite a few problems for general .NET code and there might still be a bug tail (#910). The problems might be worth the hassle if they save us, say, 2%, but if this only gets us 0.5%, maybe we can invest the time to find savings that introduce less complexity elsewhere?

Having fewer APIs we need to call unsafe is nice (so are size savings though...).

MichalStrehovsky commented 4 years ago

(Might be worth separating out the nullable annotations that are generally useless ballast and could be sweeped separately.)

MichalStrehovsky commented 4 years ago

E.g. linker is probably stripping DefaultDllImportSearchPathsAttribute and a bunch of others I found with a quick search. Those are very subtle bugs to chase after, and very fragile lists of types to keep in sync.

marek-safar commented 4 years ago

I consider --used-attrs-only to be an experimental feature which is disabled by default and needs more work to be production-ready (perhaps we should make that more explicit). Having said that the attributes explosion in the recent C# compiler version makes this feature very appealing because in small apps the metadata savings are noticeable.

vitek-karas commented 4 years ago

Good point about all this being only necessary when --used-attrs-only is on - the pattern detection logic should simply consider GetCustomAttributes (and related) safe when the feature is not enabled.

MichalStrehovsky commented 4 years ago

This issue is marked .NET 5 - are we going to expose an option to enable --used-attrs-only or enable it by default?

If not, I would suggest scoping this out and consider all attribute APIs linker safe.

marek-safar commented 4 years ago

We need a solution for .net5 to get rid of all the infrastructure attributes (compiler, linker, debugger, designer, interop, etc.) as they increase the size significantly. We could do that under a new switch or by some other means but it's needed.

MichalStrehovsky commented 4 years ago

I had a look at how much is this saving for a default empty Blazor app. We're looking at most at 3%.

The 3% is going to get smaller as we fix correctness issues. This is currently stripping many attributes that affect performance, security, correctness, and diagnosability of the resulting app.

In the performance category: IntrinsicAttribute (that tells codegen something can be done more efficiently at runtime), NonVersionableAttribute (that allows crossgen do otherwise fragile inlining, such as inlining Object:.ctor method), TypeDependencyAttribute
In the security category: UnsafeValueTypeAttribute that tells the codegen to emit extra stack checks, DefaultDllImportSearchPathsAttribute that prevents DLL planting attacks
In the correctness category: ClassInterfaceAttribute that has meaning to COM
In the diagnosability category: all sorts of attributes and debug views. I don't think we're okay with making trimmed apps hard to debug. Mono WASM could opt into killing debuggability, but I don't think .NET Core in general would be okay with that choice.

mrvoorhe commented 4 years ago

We need a solution for .net5 to get rid of all the infrastructure attributes (compiler, linker, debugger, designer, interop, etc.)

We use that new ProcessSpecialLinkTimeAttribute method (I forget the exact name) + ShouldMarkCustomAttribute method to explicitly not mark CustomAttribites that are intended for the linker. You could use the same two methods to avoid marking attributes intended for the compiler, linker, etc.

Side note, I thought we had an option to handle the debugger attributes?

mrvoorhe commented 4 years ago

Is it possible to quickly find out how much --used-attrs-only is actually saving on non-Unity workloads? The pull request that introduced it (#238) doesn't unfortunately have data on that and it's quite possible Unity does have different needs than .NET Core.

@MichalStrehovsky We have code size tests I could run to get you some data. Some of the tests do very basic usages of the mono class libraries. Those are more an indicator of how much size can be removed from the class libraries. We have others that are more representative of a real unity project.

If you'd like me to collect some data on the impact of --unity-attrs-only let me know.

MichalStrehovsky commented 4 years ago

If you'd like me to collect some data on the impact of --unity-attrs-only let me know.

@mrvoorhe Thanks! If you see more savings than 3% somewhere, I would be definitely interested in that.

You could use the same two methods to avoid marking attributes intended for the compiler, linker, etc.

The thing that makes me squeamish about that approach is that we would be hardcoding lists of attribute types interesting to both runtime and crossgen. That's a very fragile list to keep up to date with those components. How are we going to make sure that when new attributes are introduced to either of those places, linker will get updated? How do we even make sure we have a full list to start with? I identified the above attributes by diffing CoreLib with and without the parameter and using tribal knowledge to identify those that look wrong. There is no central place in either the runtime or crossgen that has all interesting attributes. This just sounds like a bug farm.

The attributes I identified above pretty much all result in very subtle, but pretty significant bugs - the attributes not being present are unlikely to break tests because we don't have good test infrastructure to test for those kinds of things. E.g. we don't re-run perf tests or the CoreCLR test tree after linker to find out that Intrinsic/NonVersionable attribute is missing. We could, but it's all non-trivial infrastructure investments and explodes the test matrix (the NonVersionable one actually requires running perf tests with linker, and ReadyToRun, and tiered compilation disabled to identify, which is a crazy but still supported combination).

MichalStrehovsky commented 4 years ago

I think it might be more palatable to have a --used-safe-attrs-only option that removes unused attributes that we put on a whitelist (e.g. CLSCompliant, the nullable junk, etc.). I'll try to see (next week or so) how close that one gets us to what --used-attrs-only can do once all the known bugs with it are fixed.

mrvoorhe commented 4 years ago

How are we going to make sure that when new attributes are introduced to either of those places, linker will get updated?

To clarify, I was referring to the scenario of explicitly removing certain attributes regardless of whether or not --used-attrs-only is used. If you forgot to update the linker to remove some new attribute, worst case would be that you missed a size optimization opportunity.

Back to --used-attrs-only, I agree that one is more problematic to maintain. There is a whitelist of a few attributes today. I personally think it's worth supporting. You are right it will require some effort to bring it to release quality. And you are right that it will take some effort to maintain going forward as new attributes are introduced.

What if you guys added a test in .NET Core that scans all of the attribute types and then asserts that they are somehow annotated? What if every attribute type defined in .NET Core had to have some sort of attribute on it

[AttributeConsumer(AttributeConsumers.Compiler)] [AttributeConsumer(AttributeConsumers.Linker)] [AttributeConsumer(AttributeConsumers.Runtime)] 
[AttributeConsumer(AttributeConsumers.Debugger)] 
[AttributeConsumer(AttributeConsumers.Any)]

The test would force all attributes to be annotated. Could someone make a mistake labeling one? Sure, but someone could make a mistake with any piece of code.

I think a test like that could help make this a manageable effort. As a bonus the linker wouldn't have to maintain an explicit list either.

mrvoorhe commented 4 years ago

I've got some data. First some context so that you can better understand the labeling I'm going to use. We have 3 Managed Stripping Levels (Low / Medium / High). Low is the default. It toggles on a conservative set of options inside the linker and not all user assemblies are linked with Low. Medium uses the same linker options but all assemblies are linked. High links all assemblies and turns on many features inside monolinker and in our UnityLinker to try and get the size down further. For example, we only enable --used-attrs-only during High. If you are interested, a more detailed write up is available here https://docs.unity3d.com/Manual/ManagedCodeStripping.html

Note that all of this data comes from using the mono class library assemblies that we use. These class libraries are a couple years behind mono/master I believe.

First some data for a simple example.

    class SimpleProgram
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello, Unity!");
        }
    }

SimpleProgram - Low Managed Stripping Level + --used-attrs-only enabled.

---------------------------
Differences
---------------------------
mscorlib.dll (WIN)
  Expected: 1585152
  Actual: 1562624
  Diff: -22528 (-1.4212%)
---------------------------
Total (WIN)
  Expected: 1588736
  Actual: 1566208
  Diff: -22528 (-1.418%)

IL2CPP Version of SimpleProgram - Low Managed Stripping Level + --used-attrs-only enabled. Generally IL2CPP builds follow the same size trend as the managed dlls.

Differences
---------------------------
native/Data/Metadata/global-metadata.dat (WIN)
  Expected: 1192304
  Actual: 1161356
  Diff: -30948 (-2.5956%)
native/SimpleProgram.exe (WIN)
  Expected: 3803912
  Actual: 3749712
  Diff: -54200 (-1.4248%)

And for context, here is SimpleProgram Low Managed Stripping Level vs High Managed Stripping Level

---------------------------
Differences
---------------------------
mscorlib.dll (WIN)
  Expected: 1585152
  Actual: 1424384
  Diff: -160768 (-10.1421%)
---------------------------
Total (WIN)
  Expected: 1588736
  Actual: 1427968
  Diff: -160768 (-10.1192%)

Now onto a unity example. This is a very basic unity project with just a few things in a scene. AddCollider - Low Managed Stripping Level + Used Attrs Only

Total (WIN)
  Expected: 2612736
  Actual: 2523648
  Diff: -89088 (-3.4098%)

Full breakdown of all assemblies https://gist.github.com/mrvoorhe/be315913ff5581a638ab0ac7824db069

And lastly a project that is representative of a game. UnityProgramWithXboxLiveFunctionality - Low Managed Stripping Level + Used Attrs Only

Total (WIN)
  Expected: 6018048
  Actual: 5867520
  Diff: -150528 (-2.5013%)

Full breakdown of all assemblies https://gist.github.com/mrvoorhe/14a602f52e258d3ba172871e82f85880

mrvoorhe commented 4 years ago

My take away from that data is that the class library savings will be in the 1-2% range. And once you add in a meaningfully size amount of user code you are looking at another 1-2% (maybe higher depending on the users coding patterns). And it's worth noting none of these projects, nor our class libraries, are making heavy, or any use, of newer compiler generated attributes that @marek-safar mentions.

I believe --used-attrs-only is the single largest general purpose CodeOptimizations feature monolinker supports.

radekdoulik commented 4 years ago

FYI, numbers for savings done by current RemoveAttributes step in Xamarin.Android, which does remove the known attributes. Measured by apkdiff.

XF test
  -      93,184 Assemblies -0.65% (of 14,273,664)
SmartHotel360
  -     103,936 Assemblies -0.58% (of 17,784,696)

Savings with PR https://github.com/xamarin/xamarin-android/pull/4107/commits/72990e4365b44f4a6cf8dd62fd269f281d3ceecf

XF test
  -     107,520 Assemblies -0.75% (of 14,273,664)
SmartHotel360
  -     134,232 Assemblies -0.75% (of 17,784,696)

@marek-safar says, this new feature is going to replace RemoveAttributes step. This step does pretty much what @MichalStrehovsky described in https://github.com/mono/linker/issues/952#issuecomment-603111822

dotnet / linker