dotnet / linker

387 stars 127 forks source link

Linker doesn't trim IEnumerable interface in console app even if it's not used #3189

Open vonzshik opened 1 year ago

vonzshik commented 1 year ago
internal class Program
{
    static void Main(string[] args)
    {
        var t = new CustomClass();
    }

    private sealed class CustomClass : IEnumerable
    {
        public IEnumerator GetEnumerator()
        {
            return new CustomEnumerator();
        }
    }

    private sealed class CustomEnumerator : IEnumerator
    {
        public object Current => throw new NotImplementedException();

        public bool MoveNext()
        {
            throw new NotImplementedException();
        }

        public void Reset()
        {
            throw new NotImplementedException();
        }
    }
}

image

Npgsql implements that interface for NpgsqlDataReader (because NpgsqlDataReader inherits from DbDataReader, inheriting IEnumerable). Our implementation of GetEnumerator returns System.Data.Common.DbEnumerator, which touches quite a bit of IDataReader methods, and some of them are not AOT friendly (mostly related to size).

vitek-karas commented 1 year ago

IEnumerable is rooted by the runtime: https://github.com/dotnet/runtime/blob/b71ae7b9f1789e0b7c5d944c5433c6e5f1a70abd/src/coreclr/vm/corelib.h#L447 (this header file is converted into a descriptor XML which is embedded into CoreLib)

IEnumerable<> is also rooted by the runtime: https://github.com/dotnet/runtime/blob/b71ae7b9f1789e0b7c5d944c5433c6e5f1a70abd/src/coreclr/System.Private.CoreLib/src/ILLink/ILLink.Descriptors.Shared.xml#L14

My understanding is that the runtime's implementation of Array depends on specific order of interfaces on that type, and thus they can't be trimmed.

For NativeAOT we might be able to do better though. @MichalStrehovsky - any ideas? I checked and in the dependency graph the presence of the interface vtable (which is what really counts in NativeAOT case) is first required by the System.String type. But i don't know the little details of what we can and cannot optimize yet in the compiler.

NinoFloris commented 1 year ago

Does this mean that once an interface type is rooted all its implementations will be as well?

vonzshik commented 1 year ago

Does this mean that once an interface type is rooted all its implementations will be as well?

I wonder whether there is a difference between the runtime and user code rooting an interface, but this repro does suggest that if a specific method from the interface is used, then all of the implementations of that method are going to be left out (even if it's easily proven that it's never going to get called):

static void Main(string[] args)
{
    ICustomInterface t = new CustomClass();
    t.Test();
    CustomClass2 t2 = new CustomClass2();
}

interface ICustomInterface
{
    void Test();
}

private sealed class CustomClass : ICustomInterface
{
    public void Test()
    {

    }
}

private sealed class CustomClass2 : ICustomInterface
{
    public void Test()
    {

    }
}

image

NinoFloris commented 1 year ago

Probably difficult to prove more complex cases where the other implementation is passed around as an ienumerable value. Would be very easy to thwart any tracing once we're dealing with those virtual methods.

vonzshik commented 1 year ago

The previous repro is also funny in a way that if I not cast CustomClass to ICustomInterface then Linker will be able to remove the interface completely. image This makes me think that if there is even a single place where you pass an interface and a specific method is used, then every single implementation of that specific method will never get trimmed.

vitek-karas commented 1 year ago

Linker is probably not as clever as you think :-) The problem with interfaces is that it's really hard to prove if a given implementation type is never casted to the interface type (in a general case). So linker has a simple rule:

Also of note, per the type system rules, when a type implements an interface, it has to implement all of the abstract members of that interface. This sometimes leads linker into keeping the method, but removing its body (replaced with a simple throw). That said - it probably doesn't happen in your example.

Just to give you an example of a piece code which is really hard to track:

interface IUtil { }

class MyUtil : IUtil {}

class Test
{
    object _instance;
    object _value;

    public void One() { _instance = new MyUtil() }
    public void Two() { ((IUtil)_value).DoSomething();
}

For this to be solvable linker would have to be able to model full data flow of values across fields. Possible, but very hard (it's gets even weirder if you consider that fields can be accessed from multiple threads at once). The complexity of linker would be really high, and consequently its performance would probably be very bad.

A grand example of this is the ToString method. (it acts like an interface method where all objects implement that interface). There's is basically always a call to object.ToString in the program somewhere. And now - how do you prove that a given type's instance never makes it to that point in the code? Only with full data flow model - which is very costly.

Right now linker implements only very few and very simple cross-method analysis - almost all analysis it does is within a single method's body - where it models data flow for some pieces of data (typically tied to reflection). This allows linker to get by with almost a single pass over all method bodies (it does two in reality because of the one cross-method thing). For a full data flow, it would have to do many passes. That gets slow really quickly, and while linker is a publish tool, it still needs to run in reasonable amount of time. It's also unclear how much memory would this need - in some cases linker already consumes a lot of memory, full data flow tracking might lead to real memory problems.

MichalStrehovsky commented 1 year ago

For NativeAOT we might be able to do better though. @MichalStrehovsky - any ideas?

For a foundational interface like IEnumerable, there's a good chance that something in CoreLib is going to use this. Even in a hello world, we need to keep a lot more than what is in user's main. For IEnumerable, just this weekend I was looking at this piece of code that uses IEnumerable even in a program that doesn't do anything in Main:

https://github.com/dotnet/runtime/blob/a272954f5c88544467112d944f662ef81b3d7487/src/libraries/System.Private.CoreLib/src/System/Resources/ResourceSet.cs#L193-L197

This code is responsible for e.g. reading the exception message for OutOfMemoryException that is preallocated before Main runs. It's probably not the only use of IEnumerable in an empty app. IEnumerable can probably never be trimmed. Same as IDisposable and other foundational interfaces.

NinoFloris commented 9 months ago

I've been playing around with this but I can't replicate the requirements as described by @vitek-karas

using System;
using System.Diagnostics.CodeAnalysis;
using System.Linq;

class Program
{
    static readonly string IMyListAqn = $"IMyList`1, {typeof(Program).Assembly.FullName}";

    public static void Main()
    {
        Console.WriteLine(TrimmableGetType(IMyListAqn));

        var objectList = new MyList<object>();
        // IMyList<object> instantiation is kept, never used, maybe a __Canon opt?
        Console.WriteLine("IMyList<object> kept: " + TrimmableCheckInterfaces(objectList.GetType()));

        var intList = new MyList<int>();
        // Nope IMyList<int> instantiation is kept too, also never used.
        Console.WriteLine("IMyList<int> kept: " + TrimmableCheckInterfaces(objectList.GetType()));

        // Throws 'not enough metadata' as expected.
        var stringList = TrimmableGetType($"MyList`1[[System.String, System.Private.CoreLib]], {typeof(Program).Assembly.FullName}");
        Console.WriteLine(stringList);
        Console.WriteLine("IMyList<string> kept: " + TrimmableCheckInterfaces(stringList!));
    }

    [UnconditionalSuppressMessage("Trimming", "IL2070")]
    static bool TrimmableCheckInterfaces(Type type) =>
        type.GetInterfaces().Any(x => x.IsConstructedGenericType && x.GetGenericTypeDefinition() == TrimmableGetType(IMyListAqn));

    [UnconditionalSuppressMessage("Trimming", "IL2057")]
    static Type? TrimmableGetType(string str) => Type.GetType(str);
}

interface IMyList<T>
{
    void Add(T item);
    T this[int index] { get; }
}

class MyList<T> : IMyList<T>
{
    void IMyList<T>.Add(T item) => throw new NotImplementedException();
    T IMyList<T>.this[int index] => throw new NotImplementedException();
}

I've explicitly defined a new interface here to see if it's based on usage, it doesn't seem to be the case.

So when are interface types trimmed?

vitek-karas commented 9 months ago

I played with this a little bit @NinoFloris and the behavior is different between the trimmer (illink) and NativeAOT (ilc). The trimmer actually does remove the interface:

image

But NativeAOT doesn't:

image

But it doesn't keep any of the methods. As for the reason, I must admit I don't know. @MichalStrehovsky might understand this a bit better - this is the dependency chain reported for it:

image
MichalStrehovsky commented 9 months ago

But it doesn't keep any of the methods. As for the reason, I must admit I don't know. @MichalStrehovsky might understand this a bit better - this is the dependency chain reported for it:

Interface lists are not trimmed in native AOT right now; it's tracked in this issue: https://github.com/dotnet/runtime/issues/66716#issuecomment-1072039623. We do trim the method implementations but the type itself is kept. The cost of the type is in tens to hundreds of bytes so one needs to have many for trimming them to bring a meaningful difference (it hasn't risen up high enough in terms of priority, given the complexity).