dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

ARM/Linux Assert Fail: Testing Roslyn 2. #5501

Closed myungjoo closed 4 years ago

myungjoo commented 8 years ago

Environment: ARM/Linux (armv7l / gnueabi / Cortex A9 x 4 / Linux 3.10) Codebase: fe089109148c370dffe238b6abbbd42c5dbafc99 + dotnet/coreclr#3990 (PR in coreclr) + https://github.com/myungjoo/coreclr/tree/implement/4040 (implementing dotnet/runtime#5475) Target App: Roslyn (csc.exe, dotnet/roslyn: 9d6fdb07b9d63ef018b7fbf6f5ae72824c51da32)

Related pending fixes applied (the two not-yet-merged patchsets described above), csc.exe at ARM has progressed much more. However, I'm seeing another strange behavior right now:

sh-3.2# ./corerun csc.exe

Assert failure(PID 6173 [0x0000181d], Thread: 6173 [0x181d]): Consistency check failed: FAILED: state.fFound
    File: /home/abuild/rpmbuild/BUILD/coreclr-0.0.1/src/vm/exceptionhandling.cpp Line: 354
    Image: /sdcard/sf/clean45/Debug/csccore/corerun

**** MessageBox invoked, title 'corerun - Assert Failure (PID 6173, Thread 6173/0x181d)' ****
  Consistency check failed: FAILED: state.fFound

/home/abuild/rpmbuild/BUILD/coreclr-0.0.1/src/vm/exceptionhandling.cpp, Line: 354

Abort - Kill program
Retry - Debug
Ignore - Keep running

Image:
/sdcard/sf/clean45/Debug/csccore/corerun

********

Aborted
sh-3.2# 

With breaks, getting right before the asserting,

(gdb) info break
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0xb6375a46 in Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, Frame*)
                                           at /usr/src/debug/coreclr-0.0.1/src/vm/stackwalk.cpp:909
        breakpoint already hit 30 times
(gdb) run
Starting program: /sdcard/sf/clean45/Debug/csccore/corerun csc.exe
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb6155450 (LWP 6186)]
[New Thread 0xb58f5450 (LWP 6187)]
[New Thread 0xb4eff450 (LWP 6188)]
[New Thread 0xb24fe450 (LWP 6189)]

Breakpoint 1, Thread::StackWalkFramesEx (this=0x56a50, pRD=0xbeffb120, pCallback=0xb651271d <SystemDomain::CallersMethodCallbackWithStackMark(CrawlFrame*, void*)>, pData=0xbeffb730, flags=33,
    pStartFrame=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/stackwalk.cpp:917
(gdb) c 29
Will ignore next 28 crossings of breakpoint 1.  Continuing.

Breakpoint 1, Thread::StackWalkFramesEx (this=0x56a50, pRD=0xbeffcf88, pCallback=0xb64f0e1d <UpdateObjectRefInResumeContextCallback(CrawlFrame*, void*)>, pData=0xbeffcf14, flags=0,
    pStartFrame=0xbeffe0e8) at /usr/src/debug/coreclr-0.0.1/src/vm/stackwalk.cpp:917
(gdb) break src/vm/stackwalk.cpp:973
Breakpoint 3 at 0xb6375b64: file /usr/src/debug/coreclr-0.0.1/src/vm/stackwalk.cpp, line 973.
(gdb) disable 2
(gdb) c
Continuing.

Breakpoint 3, Thread::StackWalkFramesEx (this=0x56a50, pRD=0xbeffcf88, pCallback=0xb64f0e1d <UpdateObjectRefInResumeContextCallback(CrawlFrame*, void*)>, pData=0xbeffcf14, flags=0,
    pStartFrame=0xbeffe0e8) at /usr/src/debug/coreclr-0.0.1/src/vm/stackwalk.cpp:974
(gdb) p/s retVal
$24 = SWA_CONTINUE
(gdb) p/x iter.IsValid()
$25 = 0x0
(gdb) p/x iter
$26 = {m_crawl = {codeManState = {dwIsSet = 0x0, stateBuf = {0xcd <repeats 512 times>}}, isFrameless = 0x0, isFirst = 0x0, isInterrupted = 0x0, hasFaulted = 0x0, isIPadjusted = 0x0,
    isNativeMarker = 0x0, isProfilerDoStackSnapshot = 0x0, isNoFrameTransition = 0x0, taNoFrameTransitionMarker = 0x0, pFrame = 0xffffffff, pFunc = 0x0, pAppDomain = 0x43870, pRD = 0xbeffcf88,
    codeInfo = {m_codeAddress = 0xb650b59b, m_methodToken = {m_pRangeSection = 0x8b680, m_pCodeHeader = 0xb1ad883c}, m_pMD = 0x0, m_pJM = 0x0, m_relOffset = 0x0, m_pFunctionEntry = 0x0},
    isFilterFunclet = 0x0, isFilterFuncletCached = 0x0, fShouldParentToFuncletSkipReportingGCReferences = 0x0, fShouldCrawlframeReportGCReferences = 0x1,
    fShouldParentFrameUseUnwindTargetPCforGCReporting = 0x0, ehClauseForCatch = {Flags = 0xbeffe3a0, TryStartPC = 0xbeffcea8, TryEndPC = 0x0, HandlerStartPC = 0x54990, HandlerEndPC = 0xbeffd8a4, {
        TypeHandle = 0xbeffdbd8, ClassToken = 0xbeffdbd8, FilterOffset = 0xbeffdbd8}}, pThread = 0x56a50, pSecurityObject = 0x0, isCachedMethod = 0x0, stackWalkCache = {m_CacheEntry = {IP = 0x0,
        ESPOffset = 0x3bd4, securityObjectOffset = 0x2, fUseEbp = 0x0, fUseEbpAsFrameReg = 0x1, argSize = 0x5b6}, static s_Enabled = 0x0}, pCurGSCookie = 0xbefff990, pFirstGSCookie = 0xbeffe0e4},
  m_uFramesProcessed = 0x19, m_frameState = 0x7, m_pThread = 0x56a50, m_pStartFrame = 0xbeffe0e8, m_pRealStartFrame = 0xbeffe0e8, m_flags = 0x0, m_codeManFlags = 0x8, m_scanFlag = 0x1,
  m_cachedCodeInfo = {m_codeAddress = 0xb1ad88b1, m_methodToken = {m_pRangeSection = 0x8b680, m_pCodeHeader = 0xb1ad883c}, m_pMD = 0xb592aa3c, m_pJM = 0x492d8, m_relOffset = 0x70,
    m_pFunctionEntry = 0xafe33574}, m_pCachedGSCookie = 0x0, m_sfParent = {static maxVal = 0xffffffff, SP = 0x0}, m_sfFuncletParent = {static maxVal = 0xffffffff, SP = 0x0},
  m_fProcessNonFilterFunclet = 0x0, m_sfIntermediaryFuncletParent = {static maxVal = 0xffffffff, SP = 0x0}, m_fProcessIntermediaryNonFilterFunclet = 0x0, m_fDidFuncletReportGCReferences = 0x1,
  m_pvResumableFrameTargetSP = 0x0}
(gdb) 

After this, we head to that assertion because we have not hit state.fFound = true in the callback, UpdateObjectRefInResumeContextCallback.


(gdb) n 
(gdb) n
ExceptionTracker::FindNonvolatileRegisterPointers (pThread=0x56a50, uOriginalSP=3204440952, pRegDisplay=0xbeffcf88, uResumeFrameFP=3204441000)
    at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:347
(gdb) n
(gdb) n
(gdb) n
(gdb) n

Program received signal SIGTRAP, Trace/breakpoint trap.
0xb6b3443e in DBG_DebugBreak () from /sdcard/sf/clean45/Debug/csccore/libcoreclr.so
(gdb) 

Any hints, the meaning and intention of this assertion, will be much helpful.

Thank you.

janvorli commented 8 years ago

@myungjoo as far as I can tell, the fact that the fFound is not true means that the stack walk has not found the resume frame for the exception. There are two checks for finding the resume frame - at line 228 and 236. I would recommend adding logging of the values being compared to see if we for some reason aren't off by just say one stack slot in the comparison or something. The long comment before the line 228 attempts to describe what's going on there.

myungjoo commented 8 years ago

When this assertion happens with csc.exe, inside the iteration of StackWalkFramexEx() with while (iter.IsValid()), the callback of UpdateObjectRefInResumeContextCallback tracks pSrcContext already larger than the uResumeSp.

Therefore, unwinding pSrcContext to reach uResumeSP fails. (The first given iter.m_crawl, created by iter.Init(), inside while (iter.IsValid()), Line 960 of stackwalk.cpp, is already out of the resumeSP.

Anyway, the stack pointed by uResumeSp at assertion failure is the stack right after ThePreStub and the pStartFrame at assertion failure is 0x10 after the uResumeSp.

uResumeSp = beffe378 pStartFrame's SP = beffe388

(gdb) info frame 11
Stack frame at 0xbeffe378:
 pc = 0xaf2660ac; saved pc = 0xaf2660ac
 Outermost frame: previous frame identical to this frame (corrupt stack?)
 caller of frame at 0xbeffe378
 Arglist at 0xbeffe378, args: 
 Locals at 0xbeffe378, Previous frame's sp is 0xbeffe378
(gdb) bt
#0  0xb6b344ce in DBG_DebugBreak () from /sdcard/sf/clean45/Debug/csccore/libcoreclr.so
dotnet/coreclr#1  0xb6a8708e in DebugBreak () at /usr/src/debug/coreclr-0.0.1/src/pal/src/debug/debug.cpp:427
dotnet/coreclr#2  0xb62adca2 in CHECK::Setup (this=0xbeffceec, message=0xb6beb802 "", condition=0xb6b7f14e "state.fFound", file=0xb6b7f056 "/home/abuild/rpmbuild/BUILD/coreclr-0.0.1/src/vm/exceptionhandling.cpp", 
    line=358) at /usr/src/debug/coreclr-0.0.1/src/utilcode/check.cpp:218
dotnet/coreclr#3  0xb64f1260 in ExceptionTracker::FindNonvolatileRegisterPointers (pThread=0x56a50, uOriginalSP=3204440952, pRegDisplay=0xbeffcf88, uResumeFrameFP=3204441000)
    at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:358
dotnet/coreclr#4  0xb64f4794 in FixNonvolatileRegisters (uOriginalSP=3204440952, pThread=0x56a50, pContextRecord=0xbeffdbd8, fAborting=false) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:1211
dotnet/coreclr#5  0xb64f313a in ProcessCLRException (pExceptionRecord=0xbeffe104, MemoryStackFp=3204440992, pContextRecord=0xbeffdbd8, pDispatcherContext=0xbeffd8a4)
    at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:1127
dotnet/coreclr#6  0xb64f8704 in UnwindManagedExceptionPass2 (ex=..., unwindStartContext=0xbeffdbd8) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4428
dotnet/coreclr#7  0xb64f8a58 in UnwindManagedExceptionPass1 (ex=..., frameContext=0xbeffddf0) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4580
dotnet/coreclr#8  0xb64f8e30 in DispatchManagedException (ex=...) at /usr/src/debug/coreclr-0.0.1/src/vm/exceptionhandling.cpp:4688
dotnet/coreclr#9  0xb66e83dc in PreStubWorker (pTransitionBlock=0xbeffe354, pMD=0xaf1d6004) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1044
dotnet/coreclr#10 0xb650b852 in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#11 0xaf2660ac in ?? ()

One thing strange of the supposed to be JITted code at frame 11 is:

   0xaf26609c:  ldr r0, [sp, dotnet/coreclr#8]
   0xaf26609e:  ldr r1, [sp, dotnet/coreclr#20]
   0xaf2660a0:  ldr r2, [sp, dotnet/runtime#3868]   ; 0x28
   0xaf2660a2:  movw    lr, dotnet/coreclr#10289  ; 0x2831
   0xaf2660a6:  movt    lr, #44838  ; 0xaf26
   0xaf2660aa:  blx lr
=> 0xaf2660ac:  str r0, [sp, dotnet/coreclr#16]
   0xaf2660ae:  movw    r3, dotnet/coreclr#18097  ; 0x46b1
   0xaf2660b2:  movt    r3, #44894  ; 0xaf5e
   0xaf2660b6:  ldrb    r3, [r3, #0]
   0xaf2660b8:  cmp r3, #0

Frame 11 called its callee Frame 10 by blx lr. Is this really intended behavior?

janvorli commented 8 years ago

@myungjoo Just a blind shot - could that be related to the issue fixed for ARM64 here?: https://github.com/dotnet/coreclr/pull/4099

janvorli commented 8 years ago

Also, looking at the code in ExceptionTracker::FindNonvolatileRegisterPointers that prepares the uResumeSP and pStartFrame, it is strange that the SP in pStartFrame is higher than the uResumeSP. The loop below finds the pStartFrame (the pHighestFrameWithRegisters) that's below the uResumeSP (which is the uOriginalSP here):

    while ((UINT_PTR)pFrame < uOriginalSP)
    {
        if (pFrame->NeedsUpdateRegDisplay())
            pHighestFrameWithRegisters = pFrame;

        pFrame = pFrame->Next();
    }

So it seems that the frame has recorded a wrong SP. Can you please check what type of frame is the pHighestFrameWithRegisters here?

janvorli commented 8 years ago

Also, looking at the stack trace again, it looks like there was an exception while jitting the managed method. It would be interesting to set a breakpoint at __cxa_throw to see what it was.

myungjoo commented 8 years ago

The two last breaks with __cxa_throw have the following stack trace. The issue with ARM64 seems orthogonal with this although enforcing LR pointer to have value of corresponding variable didn't make any visible adverse effect.

Breakpoint 1, 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
(gdb) bt
]#0  0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
dotnet/coreclr#1  0xb62ef6fe in RealCOMPlusThrow (reKind=kDllNotFoundException, resID=5899, 
    wszArg1=0xbeffdaa0 u"api-ms-win-core-file-l1-2-0.dll", 
    wszArg2=0x142b08 u"The specified module could not be found.\n (Exception from HRESULT: 0x8007007E)", wszArg3=0x0, wszArg4=0x0, wszArg5=0x0, wszArg6=0x0)
    at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:13795
dotnet/coreclr#2  0xb65a5f9e in LoadLibErrorTracker::Throw (this=0xbeffda14, 
    libraryNameOrPath=...)
    at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6546
dotnet/coreclr#3  0xb65a2d06 in NDirect::NDirectLink (pMD=0xaf1d9004)
    at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:7337
dotnet/coreclr#4  0xb65a2b78 in NDirect::GetStubForILStub (pNMD=0xaf1d9004, 
    ppStubMD=0xbeffde28, dwStubFlags=0)
    at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:5879
dotnet/coreclr#5  0xb65a2f52 in GetStubForInteropMethod (pMD=0xaf1d9004, dwStubFlags=0, 
    ppStubMD=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6020
dotnet/coreclr#6  0xb66e8c94 in MethodDesc::DoPrestub (this=0xaf1d9004, pDispatchingMT=0x0)
    at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1494
dotnet/coreclr#7  0xb66e8318 in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d9004)
    at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1041
dotnet/coreclr#8  0xb650b88a in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#9  0xaf269b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) ]c
Undefined command: "".  Try "help".
(gdb) c
Continuing.

Breakpoint 1, 0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
(gdb) bt
#0  0xb6f2eb2a in __cxa_throw () from /lib/libstdc++.so.6
dotnet/coreclr#1  0xb6a8ae94 in RtlpRaiseException (ExceptionRecord=0xbeffdca4)
    at /usr/src/debug/coreclr-0.0.1/src/pal/src/exception/seh-unwind.cpp:603
dotnet/coreclr#2  0xb6a8adce in RaiseException (dwExceptionCode=3762504530, 
    dwExceptionFlags=1, nNumberOfArguments=5, lpArguments=0xbeffded4)
    at /usr/src/debug/coreclr-0.0.1/src/pal/src/exception/seh-unwind.cpp:655
dotnet/coreclr#3  0xb62e3ba6 in RaiseTheExceptionInternalOnly(OBJECTREF, int, int)::$_1::operator()(RaiseTheExceptionInternalOnly(OBJECTREF, int, int)::Param*) const (
    this=0xbeffde8c, pParam=0xbeffdec8)
    at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:3070
dotnet/coreclr#4  0xb62e37e4 in RaiseTheExceptionInternalOnly (throwable=..., rethrow=0, 
    fForStackOverflow=0) at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:3114
dotnet/coreclr#5  0xb62e4e9c in UnwindAndContinueRethrowHelperAfterCatch (
    pEntryFrame=0xbeffe0d8, pException=0x1405b8)
    at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:8747
dotnet/coreclr#6  0xb66e844e in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d9004)
    at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1043
dotnet/coreclr#7  0xb650b88a in ThePreStub () at asmhelpers.S:805
dotnet/coreclr#8  0xaf269b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
janvorli commented 8 years ago

As for the actual exception, the assembly that it fails to load should not be attempted to get loaded on Unix, it is windows only. So it seems that some of the platforms assemblies that you are using are Windows instead of Unix versions.

Can you please dump m_pszDebugMethodName, m_pszDebugClassName and m_pszDebugMethodSignature at frame 6 of the first breakpoint above? That should tell us where it came from.

Regardless, we still need to fix the problem with the stack walking, since it is orthogonal. But using the correct assemblies would get us closer to successfull running of the csc.

myungjoo commented 8 years ago

So it seems that the frame has recorded a wrong SP. Can you please check what type of frame is the pHighestFrameWithRegisters here?

When UpdateObjectRefInResumeContextCallback goes insane with uResumeSP < GetSP(pSrcContext), Its caller, FindNonvolatileRegisterPointers()'s pHighestFrameWithRegiser appears non-managed:

FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = bedcf128, Attr(0) 00000000:Managed?0

probed by:

ip = (unsigned int) pFrame->GetIP(); 
printf("%s:%d | pHighestFrameWithRegisters = %8.8x, Attr(%x) %8.8x:Managed?%d\n", __func__, __LINE__, pFrame, pFrame->GetFrameAttribs(), ip, ExecutionManager::IsManagedCode(ip));
janvorli commented 8 years ago

That's expected. There would be an explicit frame (an object of type derived from Frame) that I think can be the thing that screws the SP value. Actually, I have just realized you didn't tell me what is the type of that frame (I've asked you few days ago)

Can you please check what type of frame is the pHighestFrameWithRegisters here?

Also, it would be helpful if you dumped the m_pszDebugMethodName, m_pszDebugClassName and m_pszDebugMethodSignature as I've asked. These should help me to get better understanding on why that issue happens.

myungjoo commented 8 years ago

Although that point no longer calls __cxa_throw with today's commit, at that point, the values are:

(gdb) p m_pszDebugMethodName
$1 = (LPCUTF8) 0xaf1bd484 "ReadFileModern"
(gdb) p m_pszDebugClassName
$2 = (LPCUTF8) 0xaf1d0fd4 "NativeMethods"
(gdb) p m_pszDebugMethodSignature
$3 = (LPCUTF8) 0xaf1bd4c4 "bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)"
(gdb) bt
#0  RealCOMPlusThrow (reKind=3056921089, resID=3204438496,
    wszArg1=0xbeffdc4c u"\x204c\xaf26\xdc60\xbeff\xac6f\xb633\x11fc\xaf1d\x11fc\xaf1d\xdc80\xbeff\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\xdca8\xbeff\xfa3b\
xb659\001",
    wszArg2=0xbeffdc4c u"\x204c\xaf26\xdc60\xbeff\xac6f\xb633\x11fc\xaf1d\x11fc\xaf1d\xdc80\xbeff\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\x11c8\xaf1d\xdca8\xbeff\xfa3b\
xb659\001",
    wszArg3=0xb659fdb7 <NDirect::NDirectLink(NDirectMethodDesc*)+306> u"\xde7\xc90\x2e91\x6a8\x9df5\x39ff\x12e0\x9c98\x33f5\x6f8\xff90\x68e7\x3246\x4121\x1560\651\x4f60\xe9f6P\x4f21\xfff0\x633\x189a\x
58f1\x5f3\xff90\x5e7\230\x328\xffd1\x3fe7\x1520\xff90\x2ee7\xf9a8\x78f4\x4f8\xff90\x68e7F\xc121\x8160\x4160\540\x1160\x4120\xcf2\x1571\x4ab\x4c9a\xeff5\xfffb\x8e7\xd48\x7949\x4044X\xb968\x8899\x642\xf
fd1\xde7\x3af5\x807d\xdbd\xd298\x94f4\xd2ed\x14f4\354\x5cbf*\x6e00\x78bf\x1500\xe632\x10ff\x78bf\xfd00\xe637\xdcff\x78bd\x8000\x6fb5\x8646\660\x546\x690\442\x1191\x9746\x1cf5\x5fc\x9698\x3ff5\x4fd\220
\x3628\xffd0\x5e7\x9998\x3cf5\x3fc\220\x2e28\xffd0\x5e7\x9698\x75f5\x2fd\xff90\x2e7\230\xb28\xffd1\x19e7\x1948\x7949\x844\x1944\xa4a\x4144\x4ef2\x1f71\x33f5\xfffc\xffe7\x2e7\x398\x2d99\xadf6\374\x1128
\xffd1\xffe7\xfe7\x1148\x7949\x844\x1144\xa4a\x4144\x51f2\x1f71\x1ff5\xfffc\xbe7\xc820\x7f2\x500\xeff5\xfffe\x2e7\340\x520\xff90\xffe7\347 \x328\xffd1\x5e7\x698\x80b0\xfebd\336\x15bf\xe632\x5eff\x78bd
\x5300\xe637\x36ff\x78bd\x6600\xe637\xb0ff\x2b5\xb4af\x13b0\x8c46\x8646\xb246\xb34c\x7d4d\x6444\x2459\x3368\x1d94\x1c90\x1b91\222\x1a20\x3290\x1d90\x1c98\x1799\xcd93\x58f8\xcdc0"...,
    wszArg4=0xbeffdca8 u"\xdd50\xbeff\xfc29\xb659\xdcc0\xbeff\xdadd\xb62b\x1038\xaf1d\x1038\xaf1d\xdce0\xbeff\x3f61\xb633\x1038\xaf1d\x1004\xaf1d\xdd1c\xbeff\x1038\xaf1d\x1038\xaf1d",
    wszArg5=0xbeffda0c u"~\x8007\n", wszArg6=0xbeffda78 u"@") at /usr/src/debug/coreclr-0.0.1/src/vm/excep.cpp:13786
dotnet/coreclr#1  0xb65a304e in LoadLibErrorTracker::Throw (this=0xbeffda0c, libraryNameOrPath=...) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6546
dotnet/coreclr#2  0xb659fdb6 in NDirect::NDirectLink (pMD=0xaf1d1004) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:7337
dotnet/coreclr#3  0xb659fc28 in NDirect::GetStubForILStub (pNMD=0xaf1d1004, ppStubMD=0xbeffde20, dwStubFlags=0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:5879
dotnet/coreclr#4  0xb65a0002 in GetStubForInteropMethod (pMD=0xaf1d1004, dwStubFlags=0, ppStubMD=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/dllimport.cpp:6020
dotnet/coreclr#5  0xb66e5d7e in MethodDesc::DoPrestub (this=0xaf1d1004, pDispatchingMT=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1495
dotnet/coreclr#6  0xb66e53d8 in PreStubWorker (pTransitionBlock=0xbeffe344, pMD=0xaf1d1004) at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:1041
dotnet/coreclr#7  0xb650893a in ThePreStub () at asmhelpers.S:802
dotnet/coreclr#8  0xaf261b9e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

At that point, RealCOMPlusThrow (#0) has

(gdb) p wszArg1
$1 = (LPCWSTR) 0xbeffda98 u"api-ms-win-core-file-l1-2-0.dll"
(gdb) p wszArg2
$2 = (LPCWSTR) 0x148458 u"The specified module could not be found.\n (Exception from HRESULT: 0x8007007E)"
(gdb) p wszArg3
$3 = (LPCWSTR) 0x0
(gdb) p wszArg4
$4 = (LPCWSTR) 0x0
(gdb) p wszArg5
$5 = (LPCWSTR) 0x0
(gdb) p wszArg6
$6 = (LPCWSTR) 0x0
(gdb) p reKind
$7 = kDllNotFoundException
(gdb) p resID
$8 = 5899
(gdb) 
myungjoo commented 8 years ago

Assumption to test later:

It appears that exception handling code of CoreFX ( https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/FileStreamReadLightUp.cs ) is not properly executed with CoreCLR-ARM:

                try
                {
                    result = NativeMethods.ReadFileModern(handle, buffer, size, out bytesRead, IntPtr.Zero);
                }
                catch
                {
                    readFileModernNotAvailable = true;
                }
janvorli commented 8 years ago

Ah, so the throw is expected and the problem is really just in the stack walking. So, if you could get me the real type of the type of the pHighestFrameWithRegisters (as I've mentioned), it will be a type derived from the Frame type), it would help me to figure out more.

myungjoo commented 8 years ago

typeid() of typeinfo says it's 18PrestubMethodFrame.

DoPrestub:1494| Method:[ReadFileModern], Obj:[NativeMethods], Sig:[bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82128, Attr(0) 00000000:Managed?0 typeid:"18PrestubMethodFrame"
UpdateObjectRefInResumeContextCallback;282 ret 2
UpdateObjectRefInResumeContextCallback:228 Res. uRSP beb823b8 / GetSP beb823c8
UpdateObjectRefInResumeContextCallback:236 uICFSFP 00000000 / uRFFP beb823e8

Anyway, other two successful attempts with FindNonvolatileRegisterPointers are 16InlinedCallFrame :

FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82328, Attr(0) 00000000:Managed?0 typeid:"16InlinedCallFrame"
...
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beb82328, Attr(0) 00000000:Managed?0 typeid:"16InlinedCallFrame"
myungjoo commented 8 years ago

I have added another print-out:

#ifndef DACCESS_COMPILE
PrestubMethodFrame::PrestubMethodFrame(TransitionBlock * pTransitionBlock, MethodDesc * pMD)
    : FramedMethodFrame(pTransitionBlock, pMD)
{   
    LIMITED_METHOD_CONTRACT;

    printf("%s:%d DBGINFO: [%s][%s][%s]\n", __func__, __LINE__, pMD->m_pszDebugMethodName, pMD->m_pszDebugClassName, pMD->m_pszDebugMethodSignature);
}                                                                                                                    
#endif // #ifndef DACCESS_COMPILE  

Then, the result is more explaining itself:

PrestubMethodFrame:571 DBGINFO: [ReadFileModern][NativeMethods][bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
DoPrestub:1494| Method:[ReadFileModern], Obj:[NativeMethods], Sig:[bool *(class [System.Runtime.InteropServices]System.Runtime.InteropServices.SafeHandle,uint8*,int32,int32&,native int)]
PrestubMethodFrame:571 DBGINFO: [.ctor][System.DllNotFoundException][instance void *()]
FindNonvolatileRegisterPointers:322 | pHighestFrameWithRegisters = beaf7128, Attr(0) 00000000:Managed?0 typeid:"18PrestubMethodFrame"
UpdateObjectRefInResumeContextCallback;282 ret 2
UpdateObjectRefInResumeContextCallback:228 Res. uRSP beaf73b8 / GetSP beaf73c8
UpdateObjectRefInResumeContextCallback:236 uICFSFP 00000000 / uRFFP beaf73e8
UpdateObjectRefInResumeContextCallback:250 ret 
janvorli commented 8 years ago

The PrestubMethodFrame inherits from FramedMethodFrame, which in turn inherits from TransitionFrame. And looking at TransitionFrame::GetSP, I can see this:

    TADDR GetSP()
    {
        LIMITED_METHOD_DAC_CONTRACT;
        return GetTransitionBlock() + sizeof(TransitionBlock);
    }

So my guess is that this is what gives us the incorrect SP. Probably the C++ TransitionBlock size doesn't match the amount of data created on the stack by the ASM macro or the m_pTransitionBlock member of the frame is incorrect (which would mean the PreStubWorker got a wrong transition block address).

myungjoo commented 8 years ago

@janvorli Thanks! That really looks fishy to me as well.

Anyway, if I can't resolve this within today, I'll be able to continue working on CoreCLR almost two weeks later as I will be attending a whole different task (away from my ARM, build machines) for two weeks.

myungjoo commented 8 years ago

It was because ThePreStub did not say directives that r0-r3 (which takes 0x10 of the stack) are saved while SP has moved for them with push{r0-r3}.

With .save {r0-r3} added to push{r0-r3}, ./corerun csc.exe (no input file) now successfully runs in Linux/ARM.

With a .cs. file input, we get a whole another error. :)