dotnet / linker

388 stars 127 forks source link

Handling TypeForwardedFromAttribute while trimming #1838

Open eerhardt opened 3 years ago

eerhardt commented 3 years ago

The TypeForwardedFromAttribute allows for serializers (i.e. BinaryFormatter and XmlSerializer) to write Type names using a "stable" assembly name - one that won't change from version-to-version even if the Type moves assemblies.

These attributes typically point to facade assemblies.

The issue is that trimming will remove the facade assemblies. Thus code in XmlSerializer, which tries loading that assembly, is going to fail.

If the trimmer sees a TypeForwardedFromAttribute to a facade assembly, it is going to need to preserve that facade assembly after trimming.

I tried to find other attributes that have this same issue, and came up with 2 other instances of attributes taking assembly names:

  1. InternalsVisibleToAttribute
  2. DynamicDependencyAttribute

We may need to handle DynamicDependency similar to TypeForwardedFrom - if the attribute is referencing an assembly, that assembly can't be trimmed. My initial thoughts for InternalsVisibleTo is that it shouldn't matter if the assembly is trimmed.

If this pattern exists for more places than these attributes, we may have to add a new annotation that tells the trimmer the string is an "assembly name". Similarly to how [DynamicallyAccessedMembersAttribute] on a string tells the trimmer the string is a Type name.

cc @ericstj @safern

marek-safar commented 3 years ago

I'm not sure this is a generic pattern, it looks more like a serialization implementation detail. How does the serialization discover the new type which has the attribute on?

I don't think we want the same handling for DynamicDependency, if it points to type forwarder we should probably warn though.

eerhardt commented 3 years ago

How does the serialization discover the new type which has the attribute on?

There isn't a "new type". This attribute allows for code (as far as I can tell, serialization code) to look at a type and decide which assembly name it should use when writing the full Type name out to the pay load.

So for example:

https://github.com/dotnet/runtime/blob/8a52f1e948b6f22f418817ec1068f07b8dae2aa5/src/libraries/System.Private.CoreLib/src/System/Double.cs#L26-L27

When serializing the Type information for a System.Double, it won't use the string:

System.Double, System.Private.CoreLib, Version=6.0.0.0

instead it will use:

System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089

marek-safar commented 3 years ago

Base on the name I thought the attribute is used during deserialization not serialization.

I think one way to fix it would be to write a custom linker step that would be used when serialization is involved. If I recall correctly @vitek-karas said there will be a special linker step for serialization support so this logic could be included there.

Note: I find it very odd that seeming randomly picked types serialize special way than the rest of the framework.

MichalStrehovsky commented 3 years ago

TypeForwardedFrom is our license to move types around the class libraries - without that, moving a type is potentially a breaking change. E.g. just last month: https://github.com/dotnet/runtime/issues/47113#issuecomment-762256639

I'm not sure if we want to tie it in with our builtin serializers. The guidance I gave out in that issue for Newtonsoft would be invalid with trimming if this only kicks in when using the builtin serializers.

I think we would want to have infrastructure in illink to be able to preserve individual type forwards, so that we can surgically keep just the forwarders that are needed (instead of the current model where we need to keep the entire assembly).

MichalStrehovsky commented 3 years ago

These small assemblies that don't have much in them currently cost more than they should - I filed #1846 to do something about it.

marek-safar commented 3 years ago

TypeForwardedFrom is our license to move types around the class libraries

What I mean is that it's weird if someone builds .net6 targetting app with serialization that it stores bool value as mscorlib assembly reference but at the same time it stores e.g. TimeSpan as System.Private.Corelib assembly reference.

The only scenario where I can imagine this to be useful is when netcore app serializes output and .net framework apps reads it and hope it works. This is a very unlikely scenario for size-sensitive workloads and I don't have a good sense if it matters at all for trimmed apps.

The guidance I gave out in that issue for Newtonsoft would be invalid with trimming if this only kicks in when using the builtin serializers.

What is our recommendation for making a custom serializer trimming safe?

MichalStrehovsky commented 3 years ago

AFAIK the BCL team uses this attribute also when moving types between .NET Core assemblies. Some of these have compat guarantees that go all the way to .NET Framework. Some only for NetCore.

What is our recommendation for making a custom serializer trimming safe?

This is not isolated to serializers and TypeForwardedFrom. If I do Type.GetType("System.Object, System.Runtime"), I would expect this to work at runtime. AFAIK illink will currently resolve this to System.Object in CoreLib, drop the System.Runtime façade and this code will break at runtime.

We want our customers to refer to types using their contractual identities, not implementation identities. The contracts are our license to move types around.

ericstj commented 3 years ago

The attribute dates back before .NETCore and it's use was not random. It was specifically applied to every type that was moved in .NETFramework. It's omitted for many types in CoreLib because serializers can instead omit the assembly name here to indicate it's from the core assembly. https://github.com/dotnet/runtime/issues/23440 https://github.com/dotnet/runtime/issues/21433. I can't recall if it was done, maybe @ViktorHofer remembers, but I think we decided that we only needed to add TypeForwardedFrom to types in corelib that weren't in mscorlib on desktop, for everything else we could omit the assembly name.

In .NETCore the attribute was selectively added back in many places based on customer feedback, data collection, and design-time scenarios (designers use the same attribute on reference assemblies to generate multi-targetable source files). It wasn't universal, but it most certainly was not random.

You might be surprised how many places this crops up. Shared data sources often need to be used by both .NETCore and .NETFramework clients (eg: databases); data doesn't have a TargetFramework nor package manager to help it here.

ViktorHofer commented 3 years ago

You can find the dirty secret of how BinaryFormatter handles this here: https://github.com/dotnet/runtime/blob/1d9e50cb4735df46d3de0cee5791e97295eaf588/src/libraries/System.Runtime.Serialization.Formatters/src/System/Runtime/Serialization/Formatters/Binary/Converter.cs#L34-L39.

vitek-karas commented 3 years ago

With recent changes linker should:

So the Type.GetType("MyType, facade") will work correctly (the facade and the type forwarder should be kept).

So the question is what needs to happen due to TypeForwardedFromAttribute? For "normal" apps nothing, since normal type forwarders will be kept and things will work. The problem is deserializers which read type names from input and use Type.GetType to resolve them. Those will not work by default anyway, since there's nothing which would tell the linker to preserve a given type. I think the only potential problem is if somebody tries to make this work by either adding roots or dynamic dependencies. And if those point to the implementation assembly instead of the facade used by the deserializer the type will be preserved, but the deserializer won't be able to get it since the type forwarder would be removed.

I don't know how important is it to fix this - and if so, how to fix it. We could definitely handle TypeForwardedFrom attribute and preserve the type forwarder it points to for all types which the linker keeps. But it feels like unnecessary bloat since most apps should not ever use that.

MichalStrehovsky commented 3 years ago

We would probably have to special case TypeForwardedFromAttribute in illinker.

I think the important bit is:

We could probably come up with a heuristic that figures out whether code uses the attribute by just not marking the type until there's a reference from code. If there's a reference from code, we would also keep the attribute instances. It's not perfect.

Reflection-based serializers (that are most likely to hit this), are already on hard mode, so we might also just keep whatever is illink doing right now and wait for feedback.

marek-safar commented 3 years ago

What about removing the attributes as they are very unlikely used directly and update the DiscoverSerialization heuristics to keep them when serialization is there including the facades?

ViktorHofer commented 3 years ago

cc'ing @stephentoub @danmose who were part of the discussion of adding TypeForwardedFrom attributes back.