Open vitek-karas opened 4 years ago
There are 3 possible solutions with different level of "works by default" versus "size".
This would only detect the already safe cases, that is calls to GetCustomAttributes
(or similar APIs) with a sealed attribute type. Everything else would be reported as unsafe but nothing would be done about it.
In the above example calling
typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as unsafe - warning
Potentially smallest size, but certain use cases are invalid and user would have to handle them manually.
Linker would try to determine the type/method/field/property on which the GetCustomAttributes
is called. If successful, it would look at the actual attribute instances on it and also look into it's base classes (if necessary as per the inherited
parameter). It would only preserve the attributes which are present and are derived from the type of the attribute being asked for.
In the example above:
typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would determine that DerivedAttribute is used and it would mark it for inclusion
Type t = SomeCall(); // Linker can't analyze what t actually is
t.GetCustomAttributes<BaseAttribute>(); // Reported as unsafe, warning
t.GetCustomAttributes<DerivedAttribute>(); // Recognized as safe
Slightly larger size, but if linker can detect the items it is provably safe. If linker can't detect, still left to the user to fix. Almost all use cases in CoreFX/ASP.NET would not be analyzable as the item on which the attributes are applied is dynamic (linker can't figure it out).
Linker would simply mark the detected attribute type and all its derived types without additional analysis.
So in the above example
typeof(Bar).GetCustomAttributes<DerivedAttribute>(); // Recognized as safe, nothing to do
typeof(Bar).GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would mark DerivedAttribute
Type t = SomeCall(); // Linker can't analyze what t actually is
t.GetCustomAttributes<BaseAttribute>(); // Reported as safe, linker would mark the DerivedAttribute (without knowing if it's actually used on the type or no)
t.GetCustomAttributes<DerivedAttribute>(); // Recognized as safe
This would solve all cases where the type of the attribute is hardcoded (almost all use cases in CoreFX/ASP.NET are like that), regardless of the item the attributes are applied to. Downside is possible increase in size (hopefully not too much).
I would personally prefer solution 3 - mark derived attribute types. It has the best chance of making the app work without issues. The size increase should be very small (in CoreFX/ASP.NET only a handful of attributes are not sealed and even less actually have derived types).
The solution could be improved by:
GetCustomAttributes<Attribute>()
and instead reporting this as unsafe. This is basically the same as GetCustomAttributes()
which is inherently unsafe when linker trims unused attributes./cc @MichalStrehovsky
Discussed the proposed solutions with several people and the outcome is that we need more investigation.
So the solution for this issues should be basically # 1 from above - only detect the unsafe cases. That is detect cases where the attribute type is recognized and it's sealed and treat those as safe, and report everything else as unsafe.
The more advanced cases of class hierarchies in attribute type or being able to detect the item on which the attribute is applied are out of scope for this issue.
Is it possible to quickly find out how much --used-attrs-only
is actually saving on non-Unity workloads? The pull request that introduced it (#238) doesn't unfortunately have data on that and it's quite possible Unity does have different needs than .NET Core.
The option introduces quite a few problems for general .NET code and there might still be a bug tail (#910). The problems might be worth the hassle if they save us, say, 2%, but if this only gets us 0.5%, maybe we can invest the time to find savings that introduce less complexity elsewhere?
Having fewer APIs we need to call unsafe is nice (so are size savings though...).
(Might be worth separating out the nullable annotations that are generally useless ballast and could be sweeped separately.)
E.g. linker is probably stripping DefaultDllImportSearchPathsAttribute and a bunch of others I found with a quick search. Those are very subtle bugs to chase after, and very fragile lists of types to keep in sync.
I consider --used-attrs-only
to be an experimental feature which is disabled by default and needs more work to be production-ready (perhaps we should make that more explicit). Having said that the attributes explosion in the recent C# compiler version makes this feature very appealing because in small apps the metadata savings are noticeable.
Good point about all this being only necessary when --used-attrs-only
is on - the pattern detection logic should simply consider GetCustomAttributes
(and related) safe when the feature is not enabled.
This issue is marked .NET 5 - are we going to expose an option to enable --used-attrs-only
or enable it by default?
If not, I would suggest scoping this out and consider all attribute APIs linker safe.
We need a solution for .net5 to get rid of all the infrastructure attributes (compiler, linker, debugger, designer, interop, etc.) as they increase the size significantly. We could do that under a new switch or by some other means but it's needed.
I had a look at how much is this saving for a default empty Blazor app. We're looking at most at 3%.
The 3% is going to get smaller as we fix correctness issues. This is currently stripping many attributes that affect performance, security, correctness, and diagnosability of the resulting app.
IntrinsicAttribute
(that tells codegen something can be done more efficiently at runtime), NonVersionableAttribute
(that allows crossgen do otherwise fragile inlining, such as inlining Object:.ctor
method), TypeDependencyAttribute
UnsafeValueTypeAttribute
that tells the codegen to emit extra stack checks, DefaultDllImportSearchPathsAttribute
that prevents DLL planting attacksClassInterfaceAttribute
that has meaning to COMWe need a solution for .net5 to get rid of all the infrastructure attributes (compiler, linker, debugger, designer, interop, etc.)
We use that new ProcessSpecialLinkTimeAttribute
method (I forget the exact name) + ShouldMarkCustomAttribute method to explicitly not mark CustomAttribites that are intended for the linker. You could use the same two methods to avoid marking attributes intended for the compiler, linker, etc.
Side note, I thought we had an option to handle the debugger attributes?
Is it possible to quickly find out how much --used-attrs-only is actually saving on non-Unity workloads? The pull request that introduced it (#238) doesn't unfortunately have data on that and it's quite possible Unity does have different needs than .NET Core.
@MichalStrehovsky We have code size tests I could run to get you some data. Some of the tests do very basic usages of the mono class libraries. Those are more an indicator of how much size can be removed from the class libraries. We have others that are more representative of a real unity project.
If you'd like me to collect some data on the impact of --unity-attrs-only
let me know.
If you'd like me to collect some data on the impact of
--unity-attrs-only
let me know.
@mrvoorhe Thanks! If you see more savings than 3% somewhere, I would be definitely interested in that.
You could use the same two methods to avoid marking attributes intended for the compiler, linker, etc.
The thing that makes me squeamish about that approach is that we would be hardcoding lists of attribute types interesting to both runtime and crossgen. That's a very fragile list to keep up to date with those components. How are we going to make sure that when new attributes are introduced to either of those places, linker will get updated? How do we even make sure we have a full list to start with? I identified the above attributes by diffing CoreLib with and without the parameter and using tribal knowledge to identify those that look wrong. There is no central place in either the runtime or crossgen that has all interesting attributes. This just sounds like a bug farm.
The attributes I identified above pretty much all result in very subtle, but pretty significant bugs - the attributes not being present are unlikely to break tests because we don't have good test infrastructure to test for those kinds of things. E.g. we don't re-run perf tests or the CoreCLR test tree after linker to find out that Intrinsic
/NonVersionable
attribute is missing. We could, but it's all non-trivial infrastructure investments and explodes the test matrix (the NonVersionable
one actually requires running perf tests with linker, and ReadyToRun, and tiered compilation disabled to identify, which is a crazy but still supported combination).
I think it might be more palatable to have a --used-safe-attrs-only option that removes unused attributes that we put on a whitelist (e.g. CLSCompliant, the nullable junk, etc.). I'll try to see (next week or so) how close that one gets us to what --used-attrs-only
can do once all the known bugs with it are fixed.
How are we going to make sure that when new attributes are introduced to either of those places, linker will get updated?
To clarify, I was referring to the scenario of explicitly removing certain attributes regardless of whether or not --used-attrs-only
is used. If you forgot to update the linker to remove some new attribute, worst case would be that you missed a size optimization opportunity.
Back to --used-attrs-only
, I agree that one is more problematic to maintain. There is a whitelist of a few attributes today. I personally think it's worth supporting. You are right it will require some effort to bring it to release quality. And you are right that it will take some effort to maintain going forward as new attributes are introduced.
What if you guys added a test in .NET Core that scans all of the attribute types and then asserts that they are somehow annotated? What if every attribute type defined in .NET Core had to have some sort of attribute on it
[AttributeConsumer(AttributeConsumers.Compiler)] [AttributeConsumer(AttributeConsumers.Linker)] [AttributeConsumer(AttributeConsumers.Runtime)]
[AttributeConsumer(AttributeConsumers.Debugger)]
[AttributeConsumer(AttributeConsumers.Any)]
The test would force all attributes to be annotated. Could someone make a mistake labeling one? Sure, but someone could make a mistake with any piece of code.
I think a test like that could help make this a manageable effort. As a bonus the linker wouldn't have to maintain an explicit list either.
I've got some data. First some context so that you can better understand the labeling I'm going to use. We have 3 Managed Stripping Levels (Low / Medium / High). Low is the default. It toggles on a conservative set of options inside the linker and not all user assemblies are linked with Low. Medium uses the same linker options but all assemblies are linked. High links all assemblies and turns on many features inside monolinker and in our UnityLinker to try and get the size down further. For example, we only enable --used-attrs-only
during High
. If you are interested, a more detailed write up is available here https://docs.unity3d.com/Manual/ManagedCodeStripping.html
Note that all of this data comes from using the mono class library assemblies that we use. These class libraries are a couple years behind mono/master
I believe.
First some data for a simple example.
class SimpleProgram
{
static void Main(string[] args)
{
Console.WriteLine("Hello, Unity!");
}
}
SimpleProgram - Low Managed Stripping Level + --used-attrs-only
enabled.
---------------------------
Differences
---------------------------
mscorlib.dll (WIN)
Expected: 1585152
Actual: 1562624
Diff: -22528 (-1.4212%)
---------------------------
Total (WIN)
Expected: 1588736
Actual: 1566208
Diff: -22528 (-1.418%)
IL2CPP Version of SimpleProgram - Low Managed Stripping Level + --used-attrs-only
enabled. Generally IL2CPP builds follow the same size trend as the managed dlls.
Differences
---------------------------
native/Data/Metadata/global-metadata.dat (WIN)
Expected: 1192304
Actual: 1161356
Diff: -30948 (-2.5956%)
native/SimpleProgram.exe (WIN)
Expected: 3803912
Actual: 3749712
Diff: -54200 (-1.4248%)
And for context, here is SimpleProgram Low Managed Stripping Level vs High Managed Stripping Level
---------------------------
Differences
---------------------------
mscorlib.dll (WIN)
Expected: 1585152
Actual: 1424384
Diff: -160768 (-10.1421%)
---------------------------
Total (WIN)
Expected: 1588736
Actual: 1427968
Diff: -160768 (-10.1192%)
Now onto a unity example. This is a very basic unity project with just a few things in a scene. AddCollider - Low Managed Stripping Level + Used Attrs Only
Total (WIN)
Expected: 2612736
Actual: 2523648
Diff: -89088 (-3.4098%)
Full breakdown of all assemblies https://gist.github.com/mrvoorhe/be315913ff5581a638ab0ac7824db069
And lastly a project that is representative of a game. UnityProgramWithXboxLiveFunctionality - Low Managed Stripping Level + Used Attrs Only
Total (WIN)
Expected: 6018048
Actual: 5867520
Diff: -150528 (-2.5013%)
Full breakdown of all assemblies https://gist.github.com/mrvoorhe/14a602f52e258d3ba172871e82f85880
My take away from that data is that the class library savings will be in the 1-2% range. And once you add in a meaningfully size amount of user code you are looking at another 1-2% (maybe higher depending on the users coding patterns). And it's worth noting none of these projects, nor our class libraries, are making heavy, or any use, of newer compiler generated attributes that @marek-safar mentions.
I believe --used-attrs-only
is the single largest general purpose CodeOptimizations
feature monolinker supports.
FYI, numbers for savings done by current RemoveAttributes
step in Xamarin.Android, which does remove the known attributes. Measured by apkdiff.
XF test
- 93,184 Assemblies -0.65% (of 14,273,664)
SmartHotel360
- 103,936 Assemblies -0.58% (of 17,784,696)
Savings with PR https://github.com/xamarin/xamarin-android/pull/4107/commits/72990e4365b44f4a6cf8dd62fd269f281d3ceecf
XF test
- 107,520 Assemblies -0.75% (of 14,273,664)
SmartHotel360
- 134,232 Assemblies -0.75% (of 17,784,696)
@marek-safar says, this new feature is going to replace RemoveAttributes
step. This step does pretty much what @MichalStrehovsky described in https://github.com/mono/linker/issues/952#issuecomment-603111822
If
--used-attrs-only
is specified linker will remove any attribute type (and usage) which is not referenced from some method body. This creates potentially problematic code patterns mainly because reflection APIs likeGetCustomAttribute<T>
will by default return any attributes of the specified typeT
and any derived type ofT
. In this case linker will correctly keep theT
, but it will remove all of the derived attributes, which is potentially not what the calling code expects to happen.We should implement pattern recognition for the calls to
GetCustomAttribute<T>
(and similar APIs as there are multiple APIs like this in the reflection) and detect cases where:The unsafe reporting should be done only if
--used-attrs-only
is specified. Without this, linker will by default keep all attribute types which are used anywhere, so the potentially problematic callsites are guaranteed to work correctly.Sample test case showing how linker removes derived attributes: