dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.32k stars 4.74k forks source link

Linux Arm64 Release build cannot unwind the `MethodTable::GetNativeSize` function #103489

Open davidwrighton opened 5 months ago

davidwrighton commented 5 months ago

Description

The Linux Arm64 Release build as built with our current toolchain cannot unwind the MethodTable::GetNativeSize function. This causes the CoreCLR Pri1 Interop test suite to hang.

Output in gdb at the hang.

Thread 1 "corerun" hit Breakpoint 1, RealCOMPlusThrow (throwable=0xffbfa2844560, rethrow=0, rethrow@entry=-11856) at /runtime/src/coreclr/vm/excep.cpp:2839
2839    in /runtime/src/coreclr/vm/excep.cpp
(gdb) bt
#0  RealCOMPlusThrow (throwable=0xffbfa2844560, rethrow=0, rethrow@entry=-11856) at /runtime/src/coreclr/vm/excep.cpp:2839
#1  0x0000fffff75ec8c0 in RealCOMPlusThrow (throwable=0xffbfa2844560) at /runtime/src/coreclr/vm/excep.cpp:2877
#2  0x0000fffff7855220 in CallDescrWorkerInternal () at /runtime/src/coreclr/vm/arm64/calldescrworkerarm64.S:71
#3  0x0000fffff76b9d30 in CallDescrWorkerWithHandler (pCallDescrData=0xffffffffd200, fCriticalCall=<optimized out>) at /runtime/src/coreclr/vm/callhelpers.cpp:67
#4  DispatchCallSimple (pSrc=pSrc@entry=0xffffffffd2c8, numStackSlotsToCopy=0, numStackSlotsToCopy@entry=4294955712, pTargetAddress=<optimized out>,
    pTargetAddress@entry=281473778063096, dwDispatchCallSimpleFlags=<optimized out>, dwDispatchCallSimpleFlags@entry=4294955712) at /runtime/src/coreclr/vm/callhelpers.cpp:218
#5  0x0000fffff76e4ee8 in (anonymous namespace)::CallGetInterfaceImplementation (objPROTECTED=<optimized out>, interfaceTypeHandle=...)
    at /runtime/src/coreclr/vm/dynamicinterfacecastable.cpp:57
#6  DynamicInterfaceCastable::GetInterfaceImplementation (objPROTECTED=objPROTECTED@entry=0xffffffffd7a0, typeHandle=...)
    at /runtime/src/coreclr/vm/dynamicinterfacecastable.cpp:88
#7  0x0000fffff76a3150 in VirtualCallStubManager::Resolver (pMT=pMT@entry=0xffffb8a459a0, token=token@entry=..., protectedObj=protectedObj@entry=0xffffffffd7a0,
    ppTarget=ppTarget@entry=0xffffffffd480, throwOnConflict=throwOnConflict@entry=1) at /runtime/src/coreclr/vm/virtualcallstub.cpp:2029
#8  0x0000fffff76a2774 in VirtualCallStubManager::ResolveWorker (this=0xaaaaaab47b70, pCallSite=<optimized out>, protectedObj=0xffffffffd7a0, token=..., stubKind=<optimized out>)
    at /runtime/src/coreclr/vm/virtualcallstub.cpp:1568
#9  0x0000fffff76a1f88 in VSD_ResolveWorker (pTransitionBlock=<optimized out>, siteAddrForRegisterIndirect=<optimized out>, token=90194313216, flags=<optimized out>)
    at /runtime/src/coreclr/vm/virtualcallstub.cpp:1381
#10 0x0000fffff7854bf0 in ResolveWorkerAsmStub () at /runtime/src/coreclr/vm/arm64/asmhelpers.S:598
#11 0x0000ffffb8ab1828 in ?? ()
#12 0x0000ffbfa2838420 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Reproduction Steps

Build the linux arm64 Release build of coreclr, and run the Interop coreclr test workitem.

Expected behavior

Test passes

Actual behavior

Interop test leg produces output that ends with the following sequence:

Running BlittableClassByOutAttr...
07:41:22.315 Passed test: global::LayoutClass.LayoutClassTest.BlittableClassByOutAttr()
07:41:22.316 Running test: global::LayoutClass.LayoutClassTest.SealedBlittableClass()
Running SealedBlittableClass...
07:41:22.317 Passed test: global::LayoutClass.LayoutClassTest.SealedBlittableClass()
07:41:22.318 Running test: global::LayoutClass.LayoutClassTest.SealedBlittableClassByInAttr()
Running SealedBlittableClassByOutAttr...
07:41:22.319 Passed test: global::LayoutClass.LayoutClassTest.SealedBlittableClassByInAttr()
07:41:22.320 Running test: global::LayoutClass.LayoutClassTest.SealedBlittableClassByOutAttr()
Running SealedBlittableClassByOutAttr...
07:41:22.320 Passed test: global::LayoutClass.LayoutClassTest.SealedBlittableClassByOutAttr()
07:41:22.322 Running test: global::LayoutClass.LayoutClassTest.SealedBlittablePinned()
Running SealedBlittablePinned...
07:41:22.322 Passed test: global::LayoutClass.LayoutClassTest.SealedBlittablePinned()
07:41:22.323 Running test: global::LayoutClass.LayoutClassTest.BlittablePinned()
Running BlittablePinned...
07:41:22.324 Passed test: global::LayoutClass.LayoutClassTest.BlittablePinned()
07:41:22.325 Running test: global::LayoutClass.LayoutClassTest.NestedLayoutClass()
Running NestedLayoutClass...
07:41:22.326 Passed test: global::LayoutClass.LayoutClassTest.NestedLayoutClass()
07:41:22.327 Running test: global::LayoutClass.LayoutClassTest.RecursiveNativeLayout()
Running RecursiveNativeLayout...
Child process took too long. Timed out... Exiting...
App Exit Code: 110
Expected: 100
Actual: 110
END EXECUTION - FAILED
+ test_exit_code=1
+ dotnet /root/helix/work/correlation/XUnitLogChecker/XUnitLogChecker.dll --results-path Interop/Interop --test-wrapper Interop --dumps-path /home/helixbot/dotnetbuild/dumps
[XUnitLogChecker]: 09:40:45.28: The full run will be done.
[XUnitLogChecker]: 09:40:45.64: Item 'Interop' did not finish running. Checking and fixing the log...
[XUnitLogChecker]: 09:40:45.74: XUnit log file has been fixed!

22/286 tests run.
* 22 tests passed.
* 0 tests failed.
* 0 tests skipped.

Regression?

No response

Known Workarounds

Put [[clang::optnone]] attribute on the MethodTable::GetNativeSize function.

Configuration

No response

Other information

No response

dotnet-policy-service[bot] commented 5 months ago

Tagging subscribers to this area: @mangod9 See info in area-owners.md if you want to be subscribed.

davidwrighton commented 5 months ago

@jkoritzinsky @sbomer when we update clang or optimization data... we should revisit if we need the workaround

janvorli commented 5 months ago

@davidwrighton have you tried to debug why the unwinder doesn't work? I also don't see the MethodTable::GetNativeSize on the call stack shown above, so I'd like to understand where it the process an attempt to unwind it was made.

janvorli commented 4 months ago

@davidwrighton one more question - how did you build the runtime? I've built it on my Ubuntu 22.04 arm64 device with the default clang 14.0 and I was unable to repro the issue. So I'd like to build it the same way you did.

davidwrighton commented 4 months ago

The official cross build docker image is the one that failed. My local build on an arm64 Ubuntu machine also didn't fail.

davidwrighton commented 4 months ago

Oh... And I must have copied the wrong back trace. Sorry. The one that really failed had a MethodTable::GetNativeSize in it

mangod9 commented 3 months ago

@janvorli, is this issue actionable for 9?

janvorli commented 3 months ago

@mangod9 it can be moved to the next version, I'd just like to understand what's causing the unwind problem here - whether it is a compiler problem or the unwinder issue. David has already merged in a workaround for this, so it is not urgent.

mangod9 commented 3 months ago

Ok moved out off 9 now.