Closed myungjoo closed 4 years ago
@myungjoo as far as I can tell, the fact that the fFound is not true means that the stack walk has not found the resume frame for the exception. There are two checks for finding the resume frame - at line 228 and 236. I would recommend adding logging of the values being compared to see if we for some reason aren't off by just say one stack slot in the comparison or something. The long comment before the line 228 attempts to describe what's going on there.
When this assertion happens with csc.exe,
inside the iteration of StackWalkFramexEx()
with while (iter.IsValid())
,
the callback of UpdateObjectRefInResumeContextCallback
tracks pSrcContext
already larger than the uResumeSp
.
Therefore, unwinding pSrcContext
to reach uResumeSP
fails. (The first given iter.m_crawl
, created by iter.Init()
, inside while (iter.IsValid())
, Line 960 of stackwalk.cpp, is already out of the resumeSP.
Anyway, the stack pointed by uResumeSp
at assertion failure is the stack right after ThePreStub
and the pStartFrame
at assertion failure is 0x10 after the uResumeSp
.
uResumeSp
= beffe378
pStartFrame
's SP = beffe388
(gdb) info frame 11
Stack frame at 0xbeffe378:
pc = 0xaf2660ac; saved pc = 0xaf2660ac
Outermost frame: previous frame identical to this frame (corrupt stack?)
caller of frame at 0xbeffe378
Arglist at 0xbeffe378, args:
Locals at 0xbeffe378, Previous frame's sp is 0xbeffe378
(gdb) bt
#0 0xb6b344ce in DBG_DebugBreak () from /sdcard/sf/clean45/Debug/csccore/libcoreclr.so
dotnet/coreclr#1 0xb6a8708e in DebugBreak () at /usr/src/debug/coreclr-0.0.1/src/pal/src/debug/debug.cpp:427
dotnet/coreclr#2 0xb62adca2 in CHECK::Setup (this=0xbeffceec, message=0xb6beb802 "", condition=0xb6b7f14e "state.fFound", file=0xb6b7f056 "/home/abuild/rpmbuild/BUILD/coreclr-0.0.1/src/vm/exceptionhandling.cpp",
line=358) at /usr/src/debug/coreclr-0.0.1/src/utilcode/check.cpp:218
dotnet/coreclr#3 0xb64f1260 in ExceptionTracker::FindNonvolatileRegisterPointers (pThread=0x56a50, uOriginalSP=3204440952, pRegDisplay=0xbeffcf88, uResumeFrameFP=3204441000)
at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:358
dotnet/coreclr#4 0xb64f4794 in FixNonvolatileRegisters (uOriginalSP=3204440952, pThread=0x56a50, pContextRecord=0xbeffdbd8, fAborting=false) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:1211
dotnet/coreclr#5 0xb64f313a in ProcessCLRException (pExceptionRecord=0xbeffe104, MemoryStackFp=3204440992, pContextRecord=0xbeffdbd8, pDispatcherContext=0xbeffd8a4)
at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:1127
dotnet/coreclr#6 0xb64f8704 in UnwindManagedExceptionPass2 (ex=..., unwindStartContext=0xbeffdbd8) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4428
dotnet/coreclr#7 0xb64f8a58 in UnwindManagedExceptionPass1 (ex=..., frameContext=0xbeffddf0) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4580
dotnet/coreclr#8 0xb64f8e30 in DispatchManagedException (ex=...) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4688
dotnet/coreclr#9 0xb66e83dc in PreStubWorker (pTransitionBlock=0xbeffe354, pMD=0xaf1d6004) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1044
dotnet/coreclr#10 0xb650b852 in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#11 0xaf2660ac in ?? ()
One thing strange of the supposed to be JITted code at frame 11 is:
0xaf26609c: ldr r0, [sp, dotnet/coreclr#8]
0xaf26609e: ldr r1, [sp, dotnet/coreclr#20]
0xaf2660a0: ldr r2, [sp, dotnet/runtime#3868] ; 0x28
0xaf2660a2: movw lr, dotnet/coreclr#10289 ; 0x2831
0xaf2660a6: movt lr, #44838 ; 0xaf26
0xaf2660aa: blx lr
=> 0xaf2660ac: str r0, [sp, dotnet/coreclr#16]
0xaf2660ae: movw r3, dotnet/coreclr#18097 ; 0x46b1
0xaf2660b2: movt r3, #44894 ; 0xaf5e
0xaf2660b6: ldrb r3, [r3, #0]
0xaf2660b8: cmp r3, #0
Frame 11 called its callee Frame 10 by blx lr
. Is this really intended behavior?
@myungjoo Just a blind shot - could that be related to the issue fixed for ARM64 here?: https://github.com/dotnet/coreclr/pull/4099
Also, looking at the code in ExceptionTracker::FindNonvolatileRegisterPointers that prepares the uResumeSP and pStartFrame, it is strange that the SP in pStartFrame is higher than the uResumeSP. The loop below finds the pStartFrame (the pHighestFrameWithRegisters) that's below the uResumeSP (which is the uOriginalSP here):
while ((UINT_PTR)pFrame < uOriginalSP)
{
if (pFrame->NeedsUpdateRegDisplay())
pHighestFrameWithRegisters = pFrame;
pFrame = pFrame->Next();
}
So it seems that the frame has recorded a wrong SP. Can you please check what type of frame is the pHighestFrameWithRegisters here?
Also, looking at the stack trace again, it looks like there was an exception while jitting the managed method. It would be interesting to set a breakpoint at __cxa_throw to see what it was.
The two last breaks with __cxa_throw have the following stack trace. The issue with ARM64 seems orthogonal with this although enforcing LR pointer to have value of corresponding variable didn't make any visible adverse effect.
Breakpoint 1, 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
(gdb) bt
]#0 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
dotnet/coreclr#1 0xb62ef6fe in RealCOMPlusThrow (reKind=kDllNotFoundException, resID=5899,
wszArg1=0xbeffdaa0 u"api-ms-win-core-file-l1-2-0.dll",
wszArg2=0x142b08 u"The specified module could not be found.\n (Exception from HRESULT: 0x8007007E)", wszArg3=0x0, wszArg4=0x0, wszArg5=0x0, wszArg6=0x0)
at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:13795
dotnet/coreclr#2 0xb65a5f9e in LoadLibErrorTracker::Throw (this=0xbeffda14,
libraryNameOrPath=...)
at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6546
dotnet/coreclr#3 0xb65a2d06 in NDirect::NDirectLink (pMD=0xaf1d9004)
at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:7337
dotnet/coreclr#4 0xb65a2b78 in NDirect::GetStubForILStub (pNMD=0xaf1d9004,
ppStubMD=0xbeffde28, dwStubFlags=0)
at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:5879
dotnet/coreclr#5 0xb65a2f52 in GetStubForInteropMethod (pMD=0xaf1d9004, dwStubFlags=0,
ppStubMD=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6020
dotnet/coreclr#6 0xb66e8c94 in MethodDesc::DoPrestub (this=0xaf1d9004, pDispatchingMT=0x0)
at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1494
dotnet/coreclr#7 0xb66e8318 in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d9004)
at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1041
dotnet/coreclr#8 0xb650b88a in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#9 0xaf269b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) ]c
Undefined command: "". Try "help".
(gdb) c
Continuing.
Breakpoint 1, 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
(gdb) bt
#0 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
dotnet/coreclr#1 0xb6a8ae94 in RtlpRaiseException (ExceptionRecord=0xbeffdca4)
at /usr/src/debug/coreclr-0.0.1/src/pal/src/exception/seh-unwind.cpp:603
dotnet/coreclr#2 0xb6a8adce in RaiseException (dwExceptionCode=3762504530,
dwExceptionFlags=1, nNumberOfArguments=5, lpArguments=0xbeffded4)
at /usr/src/debug/coreclr-0.0.1/src/pal/src/exception/seh-unwind.cpp:655
dotnet/coreclr#3 0xb62e3ba6 in RaiseTheExceptionInternalOnly(OBJECTREF, int, int)::$_1::operator()(RaiseTheExceptionInternalOnly(OBJECTREF, int, int)::Param*) const (
this=0xbeffde8c, pParam=0xbeffdec8)
at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:3070
dotnet/coreclr#4 0xb62e37e4 in RaiseTheExceptionInternalOnly (throwable=..., rethrow=0,
fForStackOverflow=0) at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:3114
dotnet/coreclr#5 0xb62e4e9c in UnwindAndContinueRethrowHelperAfterCatch (
pEntryFrame=0xbeffe0d8, pException=0x1405b8)
at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:8747
dotnet/coreclr#6 0xb66e844e in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d9004)
at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1043
dotnet/coreclr#7 0xb650b88a in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#8 0xaf269b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
As for the actual exception, the assembly that it fails to load should not be attempted to get loaded on Unix, it is windows only. So it seems that some of the platforms assemblies that you are using are Windows instead of Unix versions.
Can you please dump m_pszDebugMethodName, m_pszDebugClassName and m_pszDebugMethodSignature at frame 6 of the first breakpoint above? That should tell us where it came from.
Regardless, we still need to fix the problem with the stack walking, since it is orthogonal. But using the correct assemblies would get us closer to successfull running of the csc.
So it seems that the frame has recorded a wrong SP. Can you please check what type of frame is the pHighestFrameWithRegisters here?
When UpdateObjectRefInResumeContextCallback goes insane with uResumeSP < GetSP(pSrcContext), Its caller, FindNonvolatileRegisterPointers()'s pHighestFrameWithRegiser appears non-managed:
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = bedcf128, Attr(0) 00000000:Managed?0
probed by:
ip = (unsigned int) pFrame->GetIP();
printf("%s:%d | pHighestFrameWithRegisters = %8.8x, Attr(%x) %8.8x:Managed?%d\n", __func__, __LINE__, pFrame, pFrame->GetFrameAttribs(), ip, ExecutionManager::IsManagedCode(ip));
That's expected. There would be an explicit frame (an object of type derived from Frame) that I think can be the thing that screws the SP value. Actually, I have just realized you didn't tell me what is the type of that frame (I've asked you few days ago)
Can you please check what type of frame is the pHighestFrameWithRegisters here?
Also, it would be helpful if you dumped the m_pszDebugMethodName, m_pszDebugClassName and m_pszDebugMethodSignature as I've asked. These should help me to get better understanding on why that issue happens.
Although that point no longer calls __cxa_throw with today's commit, at that point, the values are:
(gdb) p m_pszDebugMethodName
$1 = (LPCUTF8) 0xaf1bd484 "ReadFileModern"
(gdb) p m_pszDebugClassName
$2 = (LPCUTF8) 0xaf1d0fd4 "NativeMethods"
(gdb) p m_pszDebugMethodSignature
$3 = (LPCUTF8) 0xaf1bd4c4 "bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)"
(gdb) bt
#0 RealCOMPlusThrow (reKind=3056921089, resID=3204438496,
wszArg1=0xbeffdc4c u"\x204c\xaf26\xdc60\xbeff\xac6f\xb633\x11fc\xaf1d\x11fc\xaf1d\xdc80\xbeff\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\xdca8\xbeff\xfa3b\
xb659\001",
wszArg2=0xbeffdc4c u"\x204c\xaf26\xdc60\xbeff\xac6f\xb633\x11fc\xaf1d\x11fc\xaf1d\xdc80\xbeff\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\xdca8\xbeff\xfa3b\
xb659\001",
wszArg3=0xb659fdb7 <NDirect::NDirectLink(NDirectMethodDesc*)+306> u"\xde7\xc90\x2e91\x6a8\x9df5\x39ff\x12e0\x9c98\x33f5\x6f8\xff90\x68e7\x3246\x4121\x1560\651\x4f60\xe9f6P\x4f21\xfff0\x633\x189a\x
58f1\x5f3\xff90\x5e7\230\x328\xffd1\x3fe7\x1520\xff90\x2ee7\xf9a8\x78f4\x4f8\xff90\x68e7F\xc121\x8160\x4160\540\x1160\x4120\xcf2\x1571\x4ab\x4c9a\xeff5\xfffb\x8e7\xd48\x7949\x4044X\xb968\x8899\x642\xf
fd1\xde7\x3af5\x807d\xdbd\xd298\x94f4\xd2ed\x14f4\354\x5cbf*\x6e00\x78bf\x1500\xe632\x10ff\x78bf\xfd00\xe637\xdcff\x78bd\x8000\x6fb5\x8646\660\x546\x690\442\x1191\x9746\x1cf5\x5fc\x9698\x3ff5\x4fd\220
\x3628\xffd0\x5e7\x9998\x3cf5\x3fc\220\x2e28\xffd0\x5e7\x9698\x75f5\x2fd\xff90\x2e7\230\xb28\xffd1\x19e7\x1948\x7949\x844\x1944\xa4a\x4144\x4ef2\x1f71\x33f5\xfffc\xffe7\x2e7\x398\x2d99\xadf6\374\x1128
\xffd1\xffe7\xfe7\x1148\x7949\x844\x1144\xa4a\x4144\x51f2\x1f71\x1ff5\xfffc\xbe7\xc820\x7f2\x500\xeff5\xfffe\x2e7\340\x520\xff90\xffe7\347 \x328\xffd1\x5e7\x698\x80b0\xfebd\336\x15bf\xe632\x5eff\x78bd
\x5300\xe637\x36ff\x78bd\x6600\xe637\xb0ff\x2b5\xb4af\x13b0\x8c46\x8646\xb246\xb34c\x7d4d\x6444\x2459\x3368\x1d94\x1c90\x1b91\222\x1a20\x3290\x1d90\x1c98\x1799\xcd93\x58f8\xcdc0"...,
wszArg4=0xbeffdca8 u"\xdd50\xbeff\xfc29\xb659\xdcc0\xbeff\xdadd\xb62b\x1038\xaf1d\x1038\xaf1d\xdce0\xbeff\x3f61\xb633\x1038\xaf1d\x1004\xaf1d\xdd1c\xbeff\x1038\xaf1d\x1038\xaf1d",
wszArg5=0xbeffda0c u"~\x8007\n", wszArg6=0xbeffda78 u"@") at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:13786
dotnet/coreclr#1 0xb65a304e in LoadLibErrorTracker::Throw (this=0xbeffda0c, libraryNameOrPath=...) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6546
dotnet/coreclr#2 0xb659fdb6 in NDirect::NDirectLink (pMD=0xaf1d1004) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:7337
dotnet/coreclr#3 0xb659fc28 in NDirect::GetStubForILStub (pNMD=0xaf1d1004, ppStubMD=0xbeffde20, dwStubFlags=0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:5879
dotnet/coreclr#4 0xb65a0002 in GetStubForInteropMethod (pMD=0xaf1d1004, dwStubFlags=0, ppStubMD=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6020
dotnet/coreclr#5 0xb66e5d7e in MethodDesc::DoPrestub (this=0xaf1d1004, pDispatchingMT=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1495
dotnet/coreclr#6 0xb66e53d8 in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d1004) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1041
dotnet/coreclr#7 0xb650893a in ThePreStub () at asmhelpers.S:802
dotnet/coreclr#8 0xaf261b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
At that point, RealCOMPlusThrow (#0) has
(gdb) p wszArg1
$1 = (LPCWSTR) 0xbeffda98 u"api-ms-win-core-file-l1-2-0.dll"
(gdb) p wszArg2
$2 = (LPCWSTR) 0x148458 u"The specified module could not be found.\n (Exception from HRESULT: 0x8007007E)"
(gdb) p wszArg3
$3 = (LPCWSTR) 0x0
(gdb) p wszArg4
$4 = (LPCWSTR) 0x0
(gdb) p wszArg5
$5 = (LPCWSTR) 0x0
(gdb) p wszArg6
$6 = (LPCWSTR) 0x0
(gdb) p reKind
$7 = kDllNotFoundException
(gdb) p resID
$8 = 5899
(gdb)
Assumption to test later:
It appears that exception handling code of CoreFX ( https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/FileStreamReadLightUp.cs ) is not properly executed with CoreCLR-ARM:
try
{
result = NativeMethods.ReadFileModern(handle, buffer, size, out bytesRead, IntPtr.Zero);
}
catch
{
readFileModernNotAvailable = true;
}
Ah, so the throw is expected and the problem is really just in the stack walking. So, if you could get me the real type of the type of the pHighestFrameWithRegisters (as I've mentioned), it will be a type derived from the Frame type), it would help me to figure out more.
typeid()
of typeinfo
says it's 18PrestubMethodFrame
.
DoPrestub:1494| Method:[ReadFileModern], Obj:[NativeMethods], Sig:[bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82128, Attr(0) 00000000:Managed?0 typeid:"18PrestubMethodFrame"
UpdateObjectRefInResumeContextCallback;282 ret 2
UpdateObjectRefInResumeContextCallback:228 Res. uRSP beb823b8 / GetSP beb823c8
UpdateObjectRefInResumeContextCallback:236 uICFSFP 00000000 / uRFFP beb823e8
Anyway, other two successful attempts with FindNonvolatileRegisterPointers are 16InlinedCallFrame
:
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82328, Attr(0) 00000000:Managed?0 typeid:"16InlinedCallFrame"
...
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82328, Attr(0) 00000000:Managed?0 typeid:"16InlinedCallFrame"
I have added another print-out:
#ifndef DACCESS_COMPILE
PrestubMethodFrame::PrestubMethodFrame(TransitionBlock * pTransitionBlock, MethodDesc * pMD)
: FramedMethodFrame(pTransitionBlock, pMD)
{
LIMITED_METHOD_CONTRACT;
printf("%s:%d DBGINFO: [%s][%s][%s]\n", __func__, __LINE__, pMD->m_pszDebugMethodName, pMD->m_pszDebugClassName, pMD->m_pszDebugMethodSignature);
}
#endif // #ifndef DACCESS_COMPILE
Then, the result is more explaining itself:
PrestubMethodFrame:571 DBGINFO: [ReadFileModern][NativeMethods][bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
DoPrestub:1494| Method:[ReadFileModern], Obj:[NativeMethods], Sig:[bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
PrestubMethodFrame:571 DBGINFO: [.ctor][System.DllNotFoundException][instance void *()]
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beaf7128, Attr(0) 00000000:Managed?0 typeid:"18PrestubMethodFrame"
UpdateObjectRefInResumeContextCallback;282 ret 2
UpdateObjectRefInResumeContextCallback:228 Res. uRSP beaf73b8 / GetSP beaf73c8
UpdateObjectRefInResumeContextCallback:236 uICFSFP 00000000 / uRFFP beaf73e8
UpdateObjectRefInResumeContextCallback:250 ret
The PrestubMethodFrame inherits from FramedMethodFrame, which in turn inherits from TransitionFrame. And looking at TransitionFrame::GetSP, I can see this:
TADDR GetSP()
{
LIMITED_METHOD_DAC_CONTRACT;
return GetTransitionBlock() + sizeof(TransitionBlock);
}
So my guess is that this is what gives us the incorrect SP. Probably the C++ TransitionBlock size doesn't match the amount of data created on the stack by the ASM macro or the m_pTransitionBlock member of the frame is incorrect (which would mean the PreStubWorker got a wrong transition block address).
@janvorli Thanks! That really looks fishy to me as well.
Anyway, if I can't resolve this within today, I'll be able to continue working on CoreCLR almost two weeks later as I will be attending a whole different task (away from my ARM, build machines) for two weeks.
It was because ThePreStub did not say directives that r0-r3 (which takes 0x10 of the stack) are saved while SP has moved for them with push{r0-r3}.
With .save {r0-r3} added to push{r0-r3}, ./corerun csc.exe
(no input file) now successfully runs in Linux/ARM.
With a .cs. file input, we get a whole another error. :)
Environment: ARM/Linux (armv7l / gnueabi / Cortex A9 x 4 / Linux 3.10) Codebase: fe089109148c370dffe238b6abbbd42c5dbafc99 + dotnet/coreclr#3990 (PR in coreclr) + https://github.com/myungjoo/coreclr/tree/implement/4040 (implementing dotnet/runtime#5475) Target App: Roslyn (csc.exe, dotnet/roslyn: 9d6fdb07b9d63ef018b7fbf6f5ae72824c51da32)
Related pending fixes applied (the two not-yet-merged patchsets described above), csc.exe at ARM has progressed much more. However, I'm seeing another strange behavior right now:
With breaks, getting right before the asserting,
After this, we head to that assertion because we have not hit state.fFound = true in the callback,
UpdateObjectRefInResumeContextCallback
.Any hints, the meaning and intention of this assertion, will be much helpful.
Thank you.