dotnet / linker

388 stars 126 forks source link

IL1012: Fatal error in IL Linker: IsMethodNeededByInstantiatedTypeDueToPreservedScope(...) throws InvalidOperationException() #3112

Closed evgenyvalavin closed 1 year ago

evgenyvalavin commented 1 year ago

Trying to build Android .NET 7 project on the VS 17.4

ILLink : error IL1012: IL Trimmer has encountered an unexpected error. Please report the issue at https://github.com/dotnet/linker/issues [C:\agent_work\22\s\***\Android-NET_7.csproj] Fatal error in IL Linker Unhandled exception. System.InvalidOperationException: Collection was modified; enumeration operation may not execute. at Mono.Linker.Steps.MarkStep.IsMethodNeededByInstantiatedTypeDueToPreservedScope(MethodDefinition method) at Mono.Linker.Steps.MarkStep.GetRequiredMethodsForInstantiatedType(TypeDefinition type)+MoveNext() at Mono.Linker.Steps.MarkStep.MarkRequirementsForInstantiatedTypes(TypeDefinition type) at Mono.Linker.Steps.MarkStep.ProcessMethod(MethodDefinition method, DependencyInfo& reason, MessageOrigin& origin) at Mono.Linker.Steps.MarkStep.ProcessQueue() at Mono.Linker.Steps.MarkStep.ProcessPrimaryQueue() at Mono.Linker.Steps.MarkStep.Process() at Mono.Linker.Steps.MarkStep.Process(LinkContext context) at Mono.Linker.Pipeline.ProcessStep(LinkContext context, IStep step) at Mono.Linker.Pipeline.Process(LinkContext context) at Mono.Linker.Driver.Run(ILogger customLogger) at Mono.Linker.Driver.Main(String[] args)

marek-safar commented 1 year ago

Do you have sample we could use to reproduce it?

/cc @vitek-karas

vitek-karas commented 1 year ago

/cc @jtschuster

evgenyvalavin commented 1 year ago

Linker_Sample.zip

Build in Release configuration to reproduce the error

@marek-safar @jtschuster

vitek-karas commented 1 year ago

Thanks a lot for the repro. Some observations:

This fails with 7.0 linker, but works with Release build of main (8.0 linker). With Debug build of 8.0 linker this asserts here https://github.com/dotnet/linker/blob/9c993bf401377209cd7b4e84b60deab5e4f421a2/src/linker/Linker/Annotations.cs#L506 with callstack:

illink.dll!Mono.Linker.AnnotationStore.AddPreservedMethod(Mono.Cecil.IMemberDefinition definition, Mono.Cecil.MethodDefinition method) Line 506 C# illink.dll!Mono.Linker.AnnotationStore.AddPreservedMethod(Mono.Cecil.TypeDefinition type, Mono.Cecil.MethodDefinition method) Line 476 C# Microsoft.Android.Sdk.ILLink.dll!MonoDroid.Tuner.MarkJavaObjects.PreserveMethod(Mono.Cecil.TypeDefinition type, Mono.Cecil.MethodDefinition method) Line 154 C# Microsoft.Android.Sdk.ILLink.dll!MonoDroid.Tuner.MarkJavaObjects.PreserveIntPtrConstructor(Mono.Cecil.TypeDefinition type) Line 112 C# Microsoft.Android.Sdk.ILLink.dll!MonoDroid.Tuner.MarkJavaObjects.PreserveJavaObjectImplementation(Mono.Cecil.TypeDefinition type) Line 61 C# Microsoft.Android.Sdk.ILLink.dll!MonoDroid.Tuner.MarkJavaObjects.ProcessType(Mono.Cecil.TypeDefinition type) Line 46 C# illink.dll!Mono.Linker.Steps.MarkStep.MarkType(Mono.Cecil.TypeReference reference, Mono.Linker.DependencyInfo reason, Mono.Linker.MessageOrigin? origin) Line 1961 C#

vitek-karas commented 1 year ago

@sbomer for the marking and delay marking logic from custom steps. The assert is happening because the custom step calls AddPreservedMethod twice for the exact same thing: Type: Com.Airbnb.Lottie.ICancellableInvoker Method: System.Void Com.Airbnb.Lottie.ICancellableInvoker::.ctor(System.IntPtr,Android.Runtime.JniHandleOwnership)

But first time the type is not marked yet, while second time it already is - but it's pending preserved methods have not been marked yet (I think).

I don't know if this is the root cause of the 7.0 failure though.

evgenyvalavin commented 1 year ago

There was no issue with linker on .NET 6 btw

vitek-karas commented 1 year ago

The 7.0 failure seems to be a different issue. Debug build of 7.0 fails on the same assert as 8.0, but the failure in Release happens much later on. Also on a completely different method.

It fails when it's doing IsMethodNeededByInstantiatedTypeDueToPreservedScope for method System.Boolean Android.Views.View::get_HasNestedScrollingParent().

The problem is that IsMethodNeededByInstantiatedTypeDueToPreservedScope calls IsMethodNeededByTypeDueToPreservedScope which will call Annotations.GetBaseMethods on the base method - if this happens to be the first time anything asks for TypeMapInfo on the assembly the base method is from, this will process the type map info and may potentially change the base maps for methods from other assemblies. In this case this happens to be the base map for the method which is currently being processed by IsMethodNeededByInstantiatedTypeDueToPreservedScope and thus the failure.

vitek-karas commented 1 year ago

In 8.0 this code looks somewhat different and there's only IsMethodNeededByTypeDueToPreservedScope - but it calls itself recursively. In theory the same problem can probably occur in 8.0 as well. I didn't dig into the details on 8.0 to see if this can really happen or if it won't in reality due to the methods/types involved.

Definitely worth looking into even in 8.0 to make sure this is not a problem.

vitek-karas commented 1 year ago

In theory it might be possible to work around this by introducing fake references to specific things earlier on to change the order in which linker processes methods and thus avoiding the case described above. So far I wasn't able to figure out exactly what to do to make that happen though.

sbomer commented 1 year ago

From an initial look over the code I don't understand how TypeMapInfo on the base method could modify the base map for the overriding method - it is supposed to walk up the type hierarchy, not down. Are there circular assembly references?

vitek-karas commented 1 year ago

These are my notes from the debugging session (already forgot the details :wink:)

First call

ProcessMethod - {System.Boolean MvvmCross.DroidX.RecyclerView.MvxGuardedGridLayoutManager::SupportsPredictiveItemAnimations()}

Leads to AnnotationStore.GetBaseMethods for that method

Which goes into TypeMapInfo EnsureProcessed for {Xamarin.AndroidX.RecyclerView, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null}

MapType - type = {AndroidX.RecyclerView.Widget.RecyclerView}

Which will look for base method of {System.Boolean AndroidX.Core.View.INestedScrollingChild::get_HasNestedScrollingParent()} (one of the interfaces implemented by the above view)

Which is @base = {System.Boolean Android.Views.View::get_HasNestedScrollingParent()}

And it ends up adding this pair to the map.

Second call

ProcessMethod - {System.Void Android.Views.View::.ctor(Android.Content.Context)} Whichs leads to looking at {Android.Views.View} and all of its methods (if they need to be kept for instantiation)

This will call the failure point: IsMethodNeededByTypeDueToPreservedScope for {System.Boolean AndroidX.Core.View.INestedScrollingChild::get_HasNestedScrollingParent()}

This calls TypeMapInfo for assembly {Xamarin.AndroidX.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null}

Which ends up processing type {AndroidX.Core.Widget.NestedScrollView} And this ends up adding the same pair into the map again.

jtschuster commented 1 year ago

I've reproduced the issue, and the scenario looks like this:

Types; Base type B has method M. Derived type D1 derives from B, and implements interface I1 with methods from B. Derived type D2 derives from B, and implements interface I2 with methods from B.

Assembly A1 has B. Assembly A2 has D1. Assembly A3 has I1 and D2. I2 can be in any assembly.

In the trimmer: D1 is marked instantiated. -> B is marked instantiated (as a base type) -> D1's interfaces are marked, and the implementation methods are marked -> B.M() is marked -> I.M() is added as a base method for B.M()

Later, B.M() is passed to IsMethodNeededByInstantiatedTypeDueToPreservedScope() -> Base methods of B.M() are iterated over -> I.M() is one of the base methods, and is passed to IsMethodNeedeByTypeDueToPreservedScope() -> Base methods of I.M() are required, triggering the entire assembly A3 to be mapped. -> D2 is mapped -> I2.M() is added as a base method of B.M() -- While base methods of B.M() is being iterated over

Are there circular assembly references?

There is a psuedo-circular assembly reference with the way we build out TypeMapInfo A1 (psuedo-)refers to A3 (when A2 is referenced) because B.M() impls I1.M() A3 refers to A1 because D2 derives from B.

In main the same exact situation doesn't cause issues, and https://github.com/dotnet/linker/pull/3094 fixes this repro for .Net 7, but I don't think a similar situation is guaranteed to never happen. Since base methods can provide interface members, the list of base methods of a type is dependent on which assemblies have been processed by TypeMapInfo at the time GetBaseMethods is called.

vitek-karas commented 1 year ago

So if I read this right, we can't know the answer to IsMethodNeededByInstantiatedTypeDueToPreservedScope before we process the entire app... so another example of the "we need to delay processing of this". I really wish we had the dependency analysis from AOT instead...

I think the next thing to figure out is if we can fix this by accepting the fact that we don't know everything at that point - cache the list of bases (to avoid the exception), do the work anyway... is there a case where we would get something wrong if we did that? Can we have a more conservative behavior in such case (like marking more if we don't know - we do this in some other places as well)?

jtschuster commented 1 year ago

I think the next thing to figure out is if we can fix this by accepting the fact that we don't know everything at that point - cache the list of bases (to avoid the exception), do the work anyway... is there a case where we would get something wrong if we did that?

Since we keep a list of the types with interfaces and process them until we reach a steady state, we should be okay doing that. However, it looks like we don't take into account situations where a base type provides the interface method, so we'll need to add some code to handle that case in ProcessMarkedTypesWithInterfaces. We only have pretty basic tests for that case.

Another thing we could try is to build the entire TypeMapInfo at the start of the linker. That would avoid this problem entirely but might have a significant speed perf impact.

jeromelaban commented 1 year ago

I'm also experiencing this issue, in relation with changes to a net6.0-android library building on 7.0.100 SDK. I'm getting this issue now because I'm adding some internal methods/properties to an android bindings assembly.

vallgrenerik commented 1 year ago

Any update on this?
I'm currently using this work around:

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|AnyCPU'">
    <RunAOTCompilation>False</RunAOTCompilation>
    <PublishTrimmed>False</PublishTrimmed>
</PropertyGroup>
vitek-karas commented 1 year ago

The bug should be fixed in 7.0.300 which should ship in roughly a month. You can grab a nightly build here https://github.com/dotnet/installer#table.

It would be very helpful if you could try the nightly and let us know if it fixes the problem for your specific application.

vallgrenerik commented 1 year ago

The bug should be fixed in 7.0.300 which should ship in roughly a month. You can grab a nightly build here https://github.com/dotnet/installer#table.

It would be very helpful if you could try the nightly and let us know if it fixes the problem for your specific application.

Hi @vitek-karas! Just tried the nighly (7.0.300-preview.23117.19) and it ran successfully! Awesome, thank you!

vallgrenerik commented 1 year ago

@vitek-karas any down-sides of using this in production? 😅

vitek-karas commented 1 year ago

It's not an official release, so the biggest downside is that it's unsupported and it's not officially signed. So I can't recommend it, but if it works for you...

evgenyvalavin commented 1 year ago

Any news when the fix will be officially released?

vitek-karas commented 1 year ago

The fix is in 7.0.304 which you can download from https://dotnet.microsoft.com/en-us/download/dotnet/7.0.