dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.44k stars 201 forks source link

[NativeAOT-LLVM, WASM] Disable Reflection, IL scanner, RuntimeError: memory access out of bounds #1992

Open Kanawanagasaki opened 2 years ago

Kanawanagasaki commented 2 years ago

Hi, I was experimenting with compiling dotnet console application to WebAssembly. When disable reflection by adding <IlcDisableReflection>true</IlcDisableReflection> in .csproj i got ILCompiler.ScannerFailedException:

ILCompiler.ScannerFailedException: VTable of type 'System.Runtime.InteropServices.DllImportSearchPath' not computed by the IL scanner. You can work around by running the compilation with scanner disabled.
     at ILCompiler.ILScanResults.ScannedVTableProvider.GetSlice(TypeDesc type)
     at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
     at ILCompiler.DependencyAnalysis.ConstructedEETypeNode.ComputeNonRelocationBasedDependencies(NodeFactory factory)
     at ILCompiler.DependencyAnalysis.ObjectNode.GetStaticDependencies(NodeFactory factory)
     at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.GetStaticDependenciesImpl(DependencyNodeCore`1 node)
     at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.GetStaticDependencies(DependencyNodeCore`1 node)
     at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ProcessMarkStack()
     at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeMarkedNodes()
     at ILCompiler.LLVMCodegenCompilation.CompileInternal(String outputFile, ObjectDumper dumper)
     at ILCompiler.Compilation.ILCompiler.ICompilation.Compile(String outputFile, ObjectDumper dumper)
     at ILCompiler.Program.Run(String[] args)
     at ILCompiler.Program.Main(String[] args)

The exception message prompts me to disable the scanner, but I don't know how. While trying to add various properties to the .csproj, I found that with <InvariantGlobalization>true</InvariantGlobalization> property console application would compile successfully. But this time after i run it with node or in browser it will throw RuntimeError: memory access out of bounds Exception:

RuntimeError: memory access out of bounds
    at __GenericLookupFromType_S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>_TypeHandle_TKey_System___Canon (wasm://wasm/00d14eda:wasm-function[546]:0x66411)
    at S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>___ctor_2 (wasm://wasm/00d14eda:wasm-function[542]:0x65e6b)
    at S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>___ctor (wasm://wasm/00d14eda:wasm-function[1873]:0x136a15)
    at S_P_CoreLib_System_AppContext__SetData (wasm://wasm/00d14eda:wasm-function[285]:0x30ac7)
    at Internal_CompilerGenerated__Module___SetAppContextSwitches (wasm://wasm/00d14eda:wasm-function[1555]:0x12739c)
    at StartupCodeMain (wasm://wasm/00d14eda:wasm-function[159]:0x11d8c)
    at __managed__Main (wasm://wasm/00d14eda:wasm-function[1503]:0x124425)
    at main (wasm://wasm/00d14eda:wasm-function[4036]:0x1c7350)
    at C:\Users\Kanawanagasaki\Desktop\NativeAOTLLVMTest\bin\Debug\net7.0\browser-wasm\native\NativeAOTLLVMTest.js:977:22
    at callMain (C:\Users\Kanawanagasaki\Desktop\NativeAOTLLVMTest\bin\Debug\net7.0\browser-wasm\native\NativeAOTLLVMTest.js:5253:15)

It would be very handy if the console application ran without reflection, because it drastically reduces the size of the .wasm file from 18MB to 3.5MB.

Steps to reproduce:

  1. mkdir NativeAOTLLVMTest
  2. cd NativeAOTLLVMTest
  3. dotnet new console
  4. dotnet new nugetconfig
  5. in nuget.config add
    <add key="dotnet-experimental" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-experimental/nuget/v3/index.json" />
  6. in NativeAOTLLVMTest.csproj add
    <ItemGroup>
    <PackageReference Include="Microsoft.DotNet.ILCompiler.LLVM; runtime.win-x64.Microsoft.DotNet.ILCompiler.LLVM" Version="7.0.0-*" />
    </ItemGroup>
  7. in NativeAOTLLVMTest.csproj add
    <IlcDisableReflection>true</IlcDisableReflection>
  8. dotnet publish -r browser-wasm -c Debug /p:TargetArchitecture=wasm /p:PlatformTarget=AnyCPU /p:MSBuildEnableWorkloadResolver=false --self-contained Native code generator will fail with ILCompiler.ScannerFailedException:
  9. in NativeAOTLLVMTest.csproj add
    <InvariantGlobalization>true</InvariantGlobalization>
  10. dotnet publish -r browser-wasm -c Debug /p:TargetArchitecture=wasm /p:PlatformTarget=AnyCPU /p:MSBuildEnableWorkloadResolver=false --self-contained
  11. cd .\bin\Debug\net7.0\browser-wasm\native
  12. node .\NativeAOTLLVMTest.js node.js will throw RuntimeError: memory access out of bounds exception
yowl commented 2 years ago

@MichalStrehovsky do you have any idea about this off the top of your head? If not I'll do some investigation. Thanks

MichalStrehovsky commented 2 years ago

To disable the scanner, add <ItemGroup><IlcArg Include="--noscan" /></ItemGroup> to your csproj.

I know these failure modes, but the underlying issue is always something else.

To root cause, run the dependency graph viewer: https://github.com/dotnet/corert/tree/master/src/ILCompiler.DependencyAnalysisFramework/ILCompiler-DependencyGraph-Viewer and compile as usual.

The viewer will listen to events coming from the compiler.

Once you see the failure, you'll see the graph viewer sees two dependency graph. First one is from the scanner, the second from the compiler.

The compiler one will have a node for DllImportSearchPath. The scanner one won't. The task is to figure out what triggered this dependency in the compiler graph and why the scanner graph didn't come up with it.

yowl commented 2 years ago

Thanks @MichalStrehovsky, the fix for the scanner difference is merged, but for the generic dictionary problem, I don't understand enough about the difference with reflection free mode. I can trace the logic with reflection on, through the generic dictionary lookup helper which makes sense. The code in question is this call:

https://github.com/dotnet/runtimelab/blob/e8180d24c4a45296fbfd8ad0b18ab103a1726edb/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs#L57

With reflection on the generic lookup looks like this:

define i8* @"__GenericLookupFromType_S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>_TypeHandle_S_P_CoreLib_System_Collections_Generic_EqualityComparer_1<TKey_System___Canon>"(i8* %0, i8* %1) {
genericHelper:
  %slotGep = getelementptr i8, i8* %1, i32 32
  %slotGepPtrPtr = bitcast i8* %slotGep to i8**
  %dictGep = load i8*, i8** %slotGepPtrPtr, align 4
  %retGep = getelementptr i8, i8* %dictGep, i32 0
  %ptrPtr = bitcast i8* %retGep to i8**
  %typeNodeGep = load i8*, i8** %ptrPtr, align 4
  ret i8* %typeNodeGep
}

The starting method table symbol for this (the Dictionary<T1,T2>) looks like

@"__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>___SYMBOL" = constant i32* bitcast (i8* getelementptr (i8, i8* bitcast ([27 x i32*]* @"__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>___REALBASE" to i8*), i32 12) to i32*)

So we have 12 + 32 = 44 which get us to

__GenericDict_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>

Add another offset (0) and get

__MethodTable_S_P_CoreLib_System_Collections_Generic_EqualityComparer_1<String>

which looks good, as we pass that as the generic context to Default. Of course this has all been working for some time, so no surprise here.

For Reflection free mode, the method tables are much smaller and the symbols are different, and things go wrong quite quickly.

The look up is

define i8* @"__GenericLookupFromType_S_P_CoreLib_System_Collections_Generic_Dictionary_2<System___Canon__System___Canon>_TypeHandle_TKey_System___Canon"(i8* %0, i8* %1) {
genericHelper:
  %slotGep = getelementptr i8, i8* %1, i32 32
  %slotGepPtrPtr = bitcast i8* %slotGep to i8**
  %dictGep = load i8*, i8** %slotGepPtrPtr, align 4
  %retGep = getelementptr i8, i8* %dictGep, i32 16
  %ptrPtr = bitcast i8* %retGep to i8**
  %typeNodeGep = load i8*, i8** %ptrPtr, align 4
  ret i8* %typeNodeGep
}

So similar, just a different second offset. We start with

__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>___SYMBOL

Again, but with reflection free mode we have no REALBASE symbol :

@"__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>___SYMBOL" = constant i32* bitcast ([8 x i32*]* @"__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>" to i32*)

Immediately, this looks wrong, the LLVM is going for offset 32, but we only have 32 bytes in the symbol, so 32 is past the end.

@"__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object>" = global [8 x i32*] [
i32* inttoptr (i32 -1541406720 to i32*), 
i32* inttoptr (i32 44 to i32*), 
i32* bitcast ([10 x i32*]* @__MethodTable_Object to i32*), 
i32* null, 
i32* inttoptr (i32 -1113384406 to i32*), 
i32* bitcast ([2 x i32*]* @__typemanager_indirection to i32*), 
i32* bitcast ([6 x i32*]* @__MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2 to i32*), 
i32* bitcast ([3 x i32*]* @__GenericInstance_String_Object to i32*)]

Is the problem that in reflection free mode, the LLVM backend is not constructing the __MethodTable_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object> correctly and is omitting to add a symbol, i.e. __GenericDict_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object> ?

I note that there is no __GenericDict_S_P_CoreLib_System_Collections_Generic_Dictionary_2<String__Object> symbol at all in the reflection free output.

yowl commented 2 years ago

After more investigation, I think the problem is that with reflection free mode, the LLVM backend uses a non constructed EETypeNode for [S.P.CoreLib]System.Collections.Generic.Dictionary2<string,object>` and hence does not get the generic dictionary in its method table. However I can't understand at what point the LLVM IL backend should be forcing the constructed EE type. Currently it gets the non-constructed version due to this logic

https://github.com/dotnet/runtimelab/blob/e8180d24c4a45296fbfd8ad0b18ab103a1726edb/src/coreclr/tools/aot/ILCompiler.LLVM/CodeGen/ILToLLVMImporter.cs#L1910-L1916

GetLdTokenHelperForType is returning ReadyToRunHelperId.TypeHandle not ReadyToRunHelperId.NecessaryTypeHandle

And ideas?

yowl commented 2 years ago

I think I messed up this code when enabling the scanner for some reason, should be fixed after https://github.com/dotnet/runtimelab/pull/2081 is merged

yowl commented 2 years ago

@Kanawanagasaki when you get a moment, if you can try again with IlcDisableReflection true and without IlcArg Include="--noscan" that would be great. Thanks.

yowl commented 2 years ago

@Kanawanagasaki actually, you may want to wait as there seems to be a problem with the nuget packages not being published. I'll update here when that is fixed.

yowl commented 2 years ago

There is a new package published now.

yowl commented 1 year ago

@Kanawanagasaki Can you try this again please with the latest build?

Kanawanagasaki commented 12 months ago

@yowl Yes, I successfully compiled and ran code with <IlcDisableReflection>true</IlcDisableReflection> and <InvariantGlobalization>true</InvariantGlobalization>