dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.03k stars 4.68k forks source link

Framework trimming inlines CoreCLR's CoreLib into other inbox libraries #79513

Closed VSadov closed 1 year ago

VSadov commented 1 year ago

I will disable System.Data.DataSetExtensions.Tests for now to unblock CI, but perhaps the tests can be fixed

---- System.NotSupportedException : 'System.Data.DataRowExtensions+UnboxT1[System.Nullable1[System.Int32]].NullableFieldSystem.Int32' is missing native code.

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
I will disable System.Data.DataSetExtensions.Tests for now to unblock CI, but perhaps the tests can be fixed
Author: VSadov
Assignees: -
Labels: `untriaged`, `area-NativeAOT-coreclr`
Milestone: -
VSadov commented 1 year ago

Error message System.TypeInitializationException : A type initializer threw an exception. To determine which type, inspect the InnerException's StackTrace property. ---- System.NotSupportedException : 'System.Data.DataRowExtensions+UnboxT1[System.Nullable1[System.Int32]].NullableFieldSystem.Int32' is missing native code. MethodInfo.MakeGenericMethod() is not compatible with AOT compilation. Inspect and fix AOT related warnings that were generated when the app was published. For more information see https://aka.ms/nativeaot-compatibility

Stack trace at System.Runtime.CompilerServices.ClassConstructorRunner.EnsureClassConstructorRun(StaticClassConstructionContext) + 0x163 at System.Runtime.CompilerServices.ClassConstructorRunner.CheckStaticClassConstructionReturnGCStaticBase(StaticClassConstructionContext, Object) + 0xd at System.Data.DataRowExtensions.Field[T](DataRow, String) + 0x16 at MonoTests.System.Data.DataRowExtensionsTest.Field_T_DBNullFieldValue() + 0x83 at System.Data.DataSetExtensions.Tests!+0x809144 at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0xc4 ----- Inner Stack Trace ----- at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo1.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo) + 0x2f at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo1.MakeGenericMethod(Type[]) + 0x18e at System.Data.DataRowExtensions.UnboxT1.<Create>g__CreateWhenDynamicCodeSupported|1_0() + 0x6d at System.Data.DataRowExtensions.UnboxT1.Create() + 0x9 at System.Data.DataRowExtensions.UnboxT`1..cctor() + 0x1e at System.Runtime.CompilerServices.ClassConstructorRunner.EnsureClassConstructorRun(StaticClassConstructionContext*) + 0xc4

MichalStrehovsky commented 1 year ago

DataRowExtensions was fixed up to be AOT safe so this is very suspicious:

https://github.com/dotnet/runtime/blob/cfdf0f67485ae41f4454a687b1e01ccb32e5e799/src/libraries/System.Data.Common/src/System/Data/DataRowExtensions.cs#L149-L156

We were not even supposed to compile CreateWhenDynamicCodeSupported.

MichalStrehovsky commented 1 year ago

Ok, this is a pretty bad product issue that affects more than just NativeAOT. I guess it's a happy accident we found it. The thing that surfaced it was https://github.com/dotnet/runtime/commit/5435aa8f7292b14de3e48c652bcd735d7e0fdf3d that changed how we run NativeAOT testing - we started building all of CoreCLR. I couldn't repro the issue locally because I don't like building junk I don't need - building all of CoreCLR is what triggers the problem though.

https://github.com/dotnet/runtime/blob/7e10f81f6cd52ed14208373ac63b9ebd8296a62d/eng/Subsets.props#L106

If we build clr.runtime subset, we no longer build libraries against NativeAOT's corelib, but use CoreCLR's corelib instead. This is not desirable - we want to use NativeAOT's so that we run APICompat with it. Another thing this changes is what corelib we use when trimming the framework as part of framework build. Guess what RuntimeFeature.IsDynamicCodeSupported is on CoreCLR's corelib. The illink that we use during library build turns the above snippet into:

        if (typeof(T).get_IsValueType() && default(T) == null)
    {
        if (1 == 0)
        {
        }
        return CreateWhenDynamicCodeSupported();
    }

We could make the test pass again by switching back to using NativeAOT's corelib during libraries build.

But here's why it's a product issue: all shipping libraries come from a build that uses CoreCLR corelib. I think this also includes platforms targeted only by Mono - e.g. the System.Data.Common in this test is not RID-specific and we probably don't have an e.g. iOS specific build.

The worst part of this: we don't detect any of this in our testing because the testing always uses the "expected" corelib (Mono's on Mono platforms, NativeAOT's in NativeAOT testing).

Basically, framework build is inlining arbitrary parts of CoreLib, but we don't know whether the implementations couldn't actually differ.

Cc @dotnet/ilc-contrib @vitek-karas @sbomer

jkotas commented 1 year ago

cc @ViktorHofer

jkotas commented 1 year ago

This sounds like we run runtime-pack trimming during runtime-neutral part of the libraries build that is not correct. Instead, it needs to run during runtime-specific part of the build when we are building runtime packs (similar to where R2R compilation runs for CoreCLR).

ViktorHofer commented 1 year ago

But here's why it's a product issue: all shipping libraries come from a build that uses CoreCLR corelib. I think this also includes platforms targeted only by Mono - e.g. the System.Data.Common in this test is not RID-specific and we probably don't have an e.g. iOS specific build.

All shipping libraries (packages, etc.) build against CoreLib's reference assembly which is expected to be runtime neutral. cc @akoeplinger. How does the runtime variant of CoreLib come into play here?

Another thing this changes is what corelib we use when trimming the framework as part of framework build. Guess what RuntimeFeature.IsDynamicCodeSupported is on CoreCLR's corelib. The illink that we use during library build turns the above snippet into:

When you say "framework build", I assume you refer to the shared framework packages build (which lives in src/installer/pkg/sfx)?

This sounds like we run runtime-pack trimming during runtime-neutral part of the libraries build that is not correct.

AFAIK the libraries build doesn't do any kind of runtime-pack trimming, neither do we invoke crossgen2. The only trimming that happens is inside illink.targets that runs as part of a library's individual build.

MichalStrehovsky commented 1 year ago

All shipping libraries (packages, etc.) build against CoreLib's reference assembly which is expected to be runtime neutral. cc @akoeplinger. How does the runtime variant of CoreLib come into play here?

When we run trimming on the framework, we use the implementation assembly. The ref assembly wouldn't work for trimming.

ViktorHofer commented 1 year ago

That's understood. But as I said, AFAIK we don't do any runtime-pack trimming during the libraries build. I.e. our out-of-band libraries packages like System.Formats.Cbor that don't ship inside the shared framework don't get trimmed in any special way aside from what is happening in illink.targets. And correct me if I'm wrong, the library-mode trimming in illink.targets doesn't receive CoreLib as an input.

jkotas commented 1 year ago

we don't do any runtime-pack trimming during the libraries build.

The targets for runtime-pack trimming are under src/libraries: https://github.com/dotnet/runtime/blob/2a31a5bf6eb3a9283a798fa4c6e549de2d4f987b/src/libraries/sfx.proj#L41-L47

I took a closer look. I do not think we have a product issue for native AOT here (I have not checked Mono). The native ILCompiler package gets copy of assemblies before runtime-pack trimming. For example, System.Data.Common assembly that ends up in ILCompiler package comes from artifacts\obj\System.Data.Common\Release\net8.0\System.Data.Common.dll. The copy from trimmed runtime pack is at artifacts\bin\ILLinkTrimAssembly\net8.0-windows-Release-x64\trimmed-runtimepack\System.Data.Common.dll and it is not used for ILCompiler package. @MichalStrehovsky Do you agree that there is no product issue here? If you agree, this is only a problem with test infrastructure composing the product in its own way.

ViktorHofer commented 1 year ago

Yes, what you link to is just a validation step that doesn't impact the build outputs. That's why I didn't mention it. We have the same step in oob.proj as well.

jkotas commented 1 year ago

Ok, I can see that now - but that invalidates my theory. I was looking in a wrong place.

And correct me if I'm wrong, the library-mode trimming in illink.targets doesn't receive CoreLib as an input.

The library-mode trimming in illink.targets does receive CoreLib as an input and that's the problem. Here is the command line for library-mode trimming of System.Data.Common. Notice that it has -d "C:\runtime\artifacts\bin\coreclr\windows.x64.Release\IL as input.

C:\Program Files\dotnet\dotnet.exe "C:\Users\jkotas\.nuget\packages\microsoft.net.illink.tasks\7.0.100-1.22606.1\tools\net7.0\illink.dll" -a "C:\runtime\artifacts\obj\System.Data.Common\Release\net8.0\PreTrim/System.Data.Common.dll" library
       -out "C:\runtime\artifacts\obj\System.Data.Common\Release\net8.0"
       --warnaserror-  --ignore-link-attributes true --skip-unresolved true --trim-mode skip --action skip --action link System.Data.Common -b true --nowarn IL2008;IL2009;IL2012;IL2025;IL2026;IL2035;IL2050;IL2032;IL2055;IL2057;IL2058;IL2059;IL2060;IL2061;IL2062;IL2063;IL2064;IL2065;IL2066;IL2067;IL2068;IL2069;IL2070;IL2071;IL2072;IL2073;IL2074;IL2075;IL2076;IL2077;IL2078;IL2079;IL2080;IL2081;IL2082;IL2083;IL2084;IL2085;IL2086;IL2087;IL2088;IL2089;IL2090;IL2091;IL2121
        -d "C:\runtime\artifacts\bin\System.Collections\Release\net8.0"
        -d "C:\runtime\artifacts\bin\System.Collections.NonGeneric\Release\net8.0"
        -d "C:\runtime\artifacts\bin\System.ComponentModel.TypeConverter\Release\net8.0"
        -d "C:\runtime\artifacts\bin\coreclr\windows.x64.Release\IL"
        -d "C:\runtime\artifacts\bin\System.Private.Uri\Release\net8.0"
        -d "C:\runtime\artifacts\bin\System.Runtime\Release\net8.0"
        -d "C:\runtime\artifacts\bin\microsoft.netcore.app.ref\ref\net8.0"
MichalStrehovsky commented 1 year ago

I double checked the framework we ship in 7.0 runtime.win-x64.microsoft.dotnet.ilcompiler and we did inline CoreCLR's CoreLib in it. It is a product issue.

    if (typeof(T)!.IsValueType && default(T) == null)
    {
        if (1 == 0)
        {
        }
        return typeof(UnboxT<T>)!.GetMethod("NullableField", BindingFlags.Static | BindingFlags.NonPublic)!.MakeGenericMethod(Nullable.GetUnderlyingType(typeof(T))).CreateDelegate<Func<object, T>>();
    }

(This is the previous version of the method that doesn't have the CreateWhenDynamicCodeSupported local function extracted, but the problem is the same - we wanted to take the if (1 == 0) codepath.)

ViktorHofer commented 1 year ago

The library-mode trimming in illink.targets does receive CoreLib as an input and that's the problem. Here is the command line for library-mode trimming of System.Data.Common. Notice that it has -d "C:\runtime\artifacts\bin\coreclr\windows.x64.Release\IL as input.

Great find @jkotas 👍

We do have a set of projects that directly reference CoreLib, i.e. System.Data.Common: https://github.com/dotnet/runtime/blob/7edc1f674bcece24485a435f5d01cddb2574b0f2/src/libraries/System.Data.Common/src/System.Data.Common.csproj#L311-L316 ILLink then receives these references here: https://github.com/dotnet/runtime/blob/c635ae2426d07bfdfec426ac25d1e62aadcadf73/eng/illink.targets#L277

During compilation, even though we have a ProjectReference pointing to the CoreLib/src project, we build against CoreLib/ref. But that ILLink invocation doesn't honor that as it uses the ReferencePath msbuild item instead of ReferencePathWithRefAssemblies. I see that ILLink already receives reference assemblies as an input which makes me wonder if we could use CoreLib/ref here instead.