dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.23k stars 4.45k forks source link

Sigsegv when calling a virtual method via UnsafeAccessor #101643

Closed CyberAndrii closed 2 weeks ago

CyberAndrii commented 2 weeks ago

Description

Calling a virtual method via UnsafeAccessor many times crashes the runtime with exit code 139. This is likely caused by a JIT optimization.

Reproduction Steps

dotnet run -c Release
using System.Runtime.CompilerServices;

Console.WriteLine("Program started");

var obj = new Foo();

for (var i = 0; i < 1_000_000; i++)
{
    FooAccessor.Func(obj);

    // There must be another call after the call to UnsafeAccessor otherwise no crash.
    // Any of the below will cause the crash but methods like
    // GC.KeepAlive(obj), Console.Write("") or await Task.Delay(0) won't.
    // Don't know why.
    await Task.Delay(1);
    //GC.Collect();
    //Console.WriteLine(0);
}

Console.WriteLine("Success - no crash");

internal class Foo
{
    // Only crashes on virtual methods
    public virtual void Func()
    {
    }
}

internal class FooAccessor
{
    [UnsafeAccessor(UnsafeAccessorKind.Method, Name = nameof(Foo.Func))]
    public static extern void Func(Foo b);
}

Expected behavior

Prints:

Program started
Success - no crash

Exit code: 0

Actual behavior

Prints:

Program started

Exit code: 139

Regression?

No. Crashes on both .NET 8 and 9.

Known Workarounds

Apply [MethodImpl(MethodImplOptions.NoOptimization)] to the UnsafeAccessor method.

Configuration

Happens on both linux-x64 and osx-x64.

.NET SDK:
 Version:           9.0.100-preview.3.24204.13
 Commit:            81f61d8290
 Workload version:  9.0.100-manifests.77bb7ba9
 MSBuild version:   17.11.0-preview-24178-16+7ca3c98fa

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  22.04
 OS Platform: Linux
 RID:         linux-x64
 Base Path:   /home/andrii/dev/dotnet/runtime/.dotnet/sdk/9.0.100-preview.3.24204.13/

.NET workloads installed:
There are no installed workloads to display.

Host:
  Version:      9.0.0-preview.3.24172.9
  Architecture: x64
  Commit:       9e6ba1f68c

.NET SDKs installed:
  9.0.100-preview.3.24204.13 [/home/andrii/dev/dotnet/runtime/.dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 9.0.0-preview.3.24172.13 [/home/andrii/dev/dotnet/runtime/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 9.0.0-preview.3.24172.9 [/home/andrii/dev/dotnet/runtime/.dotnet/shared/Microsoft.NETCore.App]

Other information

No response

jkotas commented 2 weeks ago

cc @AaronRobinsonMSFT

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @dotnet/area-system-runtime-compilerservices See info in area-owners.md if you want to be subscribed.

EgorBo commented 2 weeks ago

Seems to be failing during instrumentation for PGO

Assert failure(PID 21772 [0x0000550c], Thread: 33224 [0x81c8]): Consistency check failed: AV in clr at this callstack:
------
CORECLR! tagCOR_ILMETHOD_FAT::GetCodeSize + 0x12 (0x00007ffb`69cad322)
CORECLR! CEEJitInfo::allocPgoInstrumentationBySchema + 0x2E7 (0x00007ffb`69cbbc77)
CLRJIT! Compiler::fgInstrumentMethod + 0x6A8 (0x00007ffb`ca6c6f28)
CLRJIT! CompilerPhaseWithStatus::DoPhase + 0x2B (0x00007ffb`ca6364db)
CLRJIT! Phase::Run + 0x6D (0x00007ffb`ca89de7d)
CLRJIT! DoPhase + 0x35 (0x00007ffb`ca636425)
CLRJIT! Compiler::compCompile + 0x20C (0x00007ffb`ca640bac)
CLRJIT! Compiler::compCompileHelper + 0x1212 (0x00007ffb`ca6450d2)
CLRJIT! `Compiler::compCompile'::`170'::__Body::Run + 0x69 (0x00007ffb`ca63ea79)
CLRJIT! Compiler::compCompile + 0xFC1 (0x00007ffb`ca6434e1)
CLRJIT! ``jitNativeCode'::`8'::__Body::Run'::`6'::__Body::Run + 0x189 (0x00007ffb`ca63e909)
CLRJIT! `jitNativeCode'::`8'::__Body::Run + 0x6D (0x00007ffb`ca63e98d)
CLRJIT! jitNativeCode + 0x19A (0x00007ffb`ca64dc2a)
CLRJIT! CILJit::compileMethod + 0x1A6 (0x00007ffb`ca6589f6)
CORECLR! invokeCompileMethodHelper + 0x1A0 (0x00007ffb`69cdfa70)
CORECLR! invokeCompileMethod + 0x1B9 (0x00007ffb`69cdf839)
CORECLR! UnsafeJitFunction + 0xB88 (0x00007ffb`69cba218)
CORECLR! MethodDesc::JitCompileCodeLocked + 0x239 (0x00007ffb`69da82c9)
CORECLR! MethodDesc::JitCompileCodeLockedEventWrapper + 0x3E3 (0x00007ffb`69da8a53)
-----
AaronRobinsonMSFT commented 2 weeks ago

This is likely caused by a JIT optimization.

@CyberAndrii Agree. There are few cases where the generated IL for the UnsafeAccessor method won't be inlined. Applying the [MethodImpl(MethodImplOptions.NoOptimization)] would mean inlining isn't occurring and a small thunk is present, that is very help data. Can you try removing [MethodImpl(MethodImplOptions.NoOptimization)], but disable Tiered Compilation? That can be accomplished by setting the following environment variable - DOTNET_TieredCompilation=0.

@dotnet/jit-contrib What can we collect here that would help root out the issue? Is a JIT dump sufficient?

AaronRobinsonMSFT commented 2 weeks ago

Thanks @EgorBo! Didn't see your reply. Let me know if this is something I should be able to handle.

EgorBo commented 2 weeks ago

Thanks @EgorBo! Didn't see your reply. Let me know if this is something I should be able to handle.

From a quick look, it seems that GetAndVerifyILHeader returns NULL for Func and it is not expected (we only expect nulled ILHeader for dynamic methods)

AaronRobinsonMSFT commented 2 weeks ago

GetAndVerifyILHeader

Oh... this is another one. Okay. I will handle this. We had a few of these during the first pass of UnsafeAccessor work.

EgorBo commented 2 weeks ago

image

The actual crash is basically a null-reference on GetCodeSize() call here

AaronRobinsonMSFT commented 2 weeks ago

Bah. This is definitely a VM issue. Thanks @EgorBo for the pointer. I have a local repro and see the issue; as I mentioned, this is yet another place where we assume only dynamic MethodDesc instances lack a header. I'm in transit but should have a fix by EOD.

AaronRobinsonMSFT commented 2 weeks ago

Flight is delayed so code gets fixed :-) Looks like this upper code section was actually "dead" anyways. I traced through the PgoManager class and it appears to be doing the right thing with UnsafeAccessor methods, so we are good there - yay!