dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

ARM, SOS: fail with breakpoint to managed code with LLDB-3.8 #5750

Closed seanshpark closed 4 years ago

seanshpark commented 8 years ago

With LLDB-3.8, breakpoint at managed code, SOS doesn't seem to work properly.

Test code: helloworld3.cs

using System;

namespace HelloWorld3
{
    public class Program
    {
        public void show_message()
        {
            string a = "Hello World";
            string b = "CoreCLR!!!";
            Console.WriteLine(a + " " + b);
        }

        public static void Main(string[] args)
        {
            Program myprogram = new Program();
            myprogram.show_message();
        }
    }
}

Build in coreclr/Tools folder

./corerun csc.exe helloworld3.cs /out:helloworld3.exe /r:mscorlib.dll

ARM target device: Raspberry Pi 2 on Ubuntu Mate, 15.10

Debug command:

lldb -o "target create corerun" \
-o "settings set target.run-args ./helloworld3.exe" \
-o "plugin load libsosplugin.so" \
-o "process launch -s" \
-o "process handle -s false SIGUSR1 SIGUSR2" \
-o "breakpoint set -n LoadLibraryExW" \
-o "process continue"

then, in (lldb) prompt run below commands one by one

breakpoint del
bpmd helloworld3.exe Program.show_message
process continue

Problem: helloworld3.exe doesn't stop or print the message, seems to loop somewhere.

If something wrong with above steps, please let me know.

seanshpark commented 8 years ago

CC @lucenticus , @manu-silicon, @mikem8361

mikem8361 commented 8 years ago

You need to include the namespace: bpmd helloworld3.exe HelloWorld3.Program.show_message

Dmitri-Botcharnikov commented 8 years ago

@mikem8361 : it works without namespace name on Ubuntu x64.

seanshpark commented 8 years ago

same result with adding namespace, bpmd helloworld3.exe HelloWorld3.Program.show_message It shows like

(lldb) process continue
Process 8311 resuming
(lldb) JITTED helloworld3!HelloWorld3.Program.show_message()
Setting breakpoint: breakpoint set --address 0x719F353C [HelloWorld3.Program.show_message()]

and nothing else.

mikem8361 commented 8 years ago

The JITTED message means the actual native/lldb breakpoints should have been set. Could you run “breakpoint list” and “bpmd –list” at that point below?

seanshpark commented 8 years ago

This is the full capture what I've done.

$ lldb -o "target create corerun" -o "settings set target.run-args ./helloworld3.exe" \
-o "plugin load libsosplugin.so" -o "process launch -s" \
-o "process handle -s false SIGUSR1 SIGUSR2" \
-o "breakpoint set -n LoadLibraryExW" -o "process continue"
(lldb) target create corerun
Current executable set to 'corerun' (arm).
(lldb) settings set target.run-args ./helloworld3.exe
(lldb) plugin load libsosplugin.so
(lldb) process launch -s
Process 16860 launched: '/home/maxwell/xp_seanrpi2/dotnet.38/corerun' (arm)
(lldb) process handle -s false SIGUSR1 SIGUSR2
NAME         PASS   STOP   NOTIFY
===========  =====  =====  ======
SIGUSR1      true   false  true 
SIGUSR2      true   false  true 
(lldb) breakpoint set -n LoadLibraryExW
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) process continue
1 location added to breakpoint 1
Process 16860 resuming
Process 16860 stopped
* thread dotnet/coreclr#1: tid = 16860, 0x7699e670 libcoreclr.so`::LoadLibraryExW(lpLibFileName=0x0008e320, hFile=0x00000000, dwFlags=0) + 28 at module.cpp:214, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x7699e670 libcoreclr.so`::LoadLibraryExW(lpLibFileName=0x0008e320, hFile=0x00000000, dwFlags=0) + 28 at module.cpp:214
   211      IN /*Reserved*/ HANDLE hFile,
   212      IN DWORD dwFlags)
   213  {
-> 214      if (dwFlags != 0) 
   215      {
   216          // UNIXTODO: Implement this!
   217          ASSERT("Needs Implementation!!!");

(lldb) breakpoint del
About to delete all breakpoints, do you want to do that?: [Y/n] 
All breakpoints removed. (1 breakpoint)
(lldb) bpmd helloworld3.exe Program.Main
Adding pending breakpoints...
(lldb) bpmd -list
!bpmd pending breakpoint list
 Breakpoint index - Location, ModuleID, Method Token
1 - helloworld3.exe!Program.Main+0, 0xFDFDFDFD, 0x00000000
(lldb) process continue
Process 16860 resuming
(lldb) JITTED helloworld3!HelloWorld3.Program.Main(System.String[])
Setting breakpoint: breakpoint set --address 0x719F2466 [HelloWorld3.Program.Main(System.String[])]
process interrupt
(lldb) Process 16860 stopped
* thread dotnet/coreclr#1: tid = 16860, 0x719f2452, name = 'corerun', stop reason = signal SIGSTOP
    frame #0: 0x719f2452
->  0x719f2452: orrpl  pc, r2, #1879048204
    0x719f2456: blhs   0x71a0c4ca
    0x719f245a: vhadd.s8 d29, d6, d4
    0x719f245e: .long  0xf2c733ad                ; unknown opcode

breakpoint list
Current breakpoints:
2: Exception breakpoint (catch: off throw: on) using: names = {'__cxa_throw', '__cxa_rethrow'}, locations = 2, resolved = 2, hit count = 2

  Names:
    DoNotDeleteOrDisable

  2.1: where = libstdc++.so.6`__cxa_throw, address = 0x76ed3538, resolved, hit count = 2 
  2.2: where = libstdc++.so.6`__cxa_rethrow, address = 0x76ed35b8, resolved, hit count = 0 

3: address = 0x719f2466, locations = 1, resolved = 1, hit count = 0
  3.1: address = 0x719f2466, resolved, hit count = 0 

(lldb) bpmd -list
!bpmd pending breakpoint list
 Breakpoint index - Location, ModuleID, Method Token
1 - helloworld3.exe!Program.Main+0, 0xFDFDFDFD, 0x7582aa70
2 - helloworld3.exe!Program.Main+0, 0xFDFDFDFD, 0x00000000
(lldb) 
mikem8361 commented 8 years ago

Everything looks ok (of course except actually hitting the bp). The “JITTED” message means sos received the notification the method was jitted and the native lldb breakpoints were created. The question is if the breakpoint address 0x719F2466 is correct managed method address. “name2ee helloworld3.exe!HelloWorld3.Program.Main” should print the method desc/name/etc. info. “ip2md 0x719F2466” should print the proper method desc. I don’t have access to an arm linux machine to do any work on this.

seanshpark commented 8 years ago
(lldb) JITTED helloworld3!HelloWorld3.Program.show_message()
Setting breakpoint: breakpoint set --address 0x719FB53C [HelloWorld3.Program.show_message()]

two commands result:

(lldb) name2ee helloworld3.exe!HelloWorld3.Program.show_message
Module:      7582ea58
Assembly:    helloworld3.exe
Token:       06000001
MethodDesc:  7582f178
Name:        HelloWorld3.Program.show_message()
JITTED Code Address: 719fb510
(lldb) ip2md 0x719FB53C
MethodDesc:   7582f178
Method Name:  HelloWorld3.Program.show_message()
Class:        7582c5e4
MethodTable:  7582f1e4
mdToken:      06000001
Module:       7582ea58
IsJitted:     yes
CodeAddr:     719fb510
Transparency: Critical

if breakpoint is of native code level, should it be set to CodeAddr, 719fb510 ?

mikem8361 commented 8 years ago

Yes, it should. Not sure why it is using 0x719FB53C (44 bytes off). Can you debug sos (with lldb or gdb)? Set a bp on strike.cpp line 6102.

Dmitri-Botcharnikov commented 8 years ago

I seems that bpmd command on ARM/Linux damages internals of VM. We've got SIGSEGV, SIGILL on some tests after this command has issued. Also breakpoint on IssueDebugerBPCommand doesn't hit in lldb.

seanshpark commented 8 years ago

Can you debug sos (with lldb or gdb)? Set a bp on strike.cpp line 6102.

Breakpoint 1, IssueDebuggerBPCommand (addr=1906292028) at /home/maxwell/coreclr/src/ToolBox/SOS/Strike/strike.cpp:6102
6102        const int MaxBPsCached = 1024;
(gdb) bt
#0  IssueDebuggerBPCommand (addr=1906292028) at /home/maxwell//coreclr/src/ToolBox/SOS/Strike/strike.cpp:6102
dotnet/coreclr#1  0x634dbaa8 in Breakpoints::ResolveMethodInstances (pMeth=0x71d07f08, ilOffset=0) at /home/maxwell/coreclr/src/ToolBox/SOS/Strike/strike.cpp:6495
dotnet/coreclr#2  0x634db902 in Breakpoints::ResolvePendingBreakpoint (this=0x6351cddc <g_bpoints>, addr=1971509848, pCur=0x71d0c4b0)
    at /home/maxwell/coreclr/src/ToolBox/SOS/Strike/strike.cpp:6654
dotnet/coreclr#3  0x634d7cf4 in Breakpoints::Update (this=0x6351cddc <g_bpoints>, mod=1971509848, isNewModule=0) at /home/maxwell/coreclr/src/ToolBox/SOS/Strike/strike.cpp:6258
dotnet/coreclr#4  0x634daccc in CNotification::OnCodeGenerated (this=0x726c97ec, method=0x71d06578) at /home/maxwell/coreclr/src/ToolBox/SOS/Strike/strike.cpp:6742
dotnet/coreclr#5  0x615289f8 in ClrDataAccess::TranslateExceptionRecordToNotification (this=0x6646c0, record=0x726c97f8, notify=0x726c97ec)
    at /home/maxwell/coreclr/src/debug/daccess/daccess.cpp:4603

and the address addr

(gdb) print/x addr
$2 = 0x719fb53c

is same value. While stepping line by line, it seemed normal, but not sure as I don't understand the logic yet.

mikem8361 commented 8 years ago

I can’t debug it on an ARM device because I don’t have one.

seanshpark commented 8 years ago

@mikem8361 , it's ok. I'm reading related document, codes and trying to understand what is going on. If you can guide me where to look at, it will be a big help, thanks in advance.

Dmitri-Botcharnikov commented 8 years ago

44 bytes offset is normal. If we set breakpoint in method ClrDataAccess::GetMethodNativeMap (file daccess.cpp:6101) we've got offset mapping from native to IL code: {{nativeOffset = 0, ilOffset = 4294967294, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 20, ilOffset = 4294967295, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 44, ilOffset = 0, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 46, ilOffset = 1, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 58, ilOffset = 7, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 70, ilOffset = 13, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 92, ilOffset = 20, source = ICorDebugInfo::CALL_INSTRUCTION}, {nativeOffset = 96, ilOffset = 25, source = ICorDebugInfo::SOURCE_TYPE_INVALID}, {nativeOffset = 106, ilOffset = 25, source = ICorDebugInfo::CALL_INSTRUCTION}, {nativeOffset = 108, ilOffset = 30, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 110, ilOffset = 31, source = ICorDebugInfo::STACK_EMPTY}, {nativeOffset = 112, ilOffset = 4294967293, source = ICorDebugInfo::STACK_EMPTY}} We can see, that native offsets up to 44 corresponds to some JIT housekeeping code. The problem is in jitted code itself. Here is disassembler of first instructions from jitted show_message:

0xb0082510: push.w {r4, r10, r11, lr}
0xb0082514: sub    sp, #0x10
0xb0082516: add.w  r11, sp, #0x18
0xb008251a: movs   r1, #0x0
0xb008251c: str    r1, [sp, #0x8]
0xb008251e: str    r1, [sp, #0x4]
0xb0082520: str    r1, [sp] 
0xb0082522: str    r0, [sp, #0xc]
0xb0082524: movw   r3, #0x5f54
0xb0082528: movt   r3, #0xb589
0xb008252c: ldr    r3, [r3]
0xb008252e: cmp    r3, #0x0
0xb0082530: beq    0xb008253c
0xb0082532: movw   r3, #0x7121
0xb0082536: movt   r3, #0xb63e
0xb008253a: blx    r3
0xb008253c: nop    
0xb008253e: movw   r3, #0x2358
0xb0082542: movt   r3, #0xb350

When I'm stepping through this code, after instruction at 0xb0082530 (beq 0xb008253c) thread state becomes invalid and control never reaches 0xb008253c (where first IL instruction starts in my case). Could it be exception? PS map dump is obtained by gdb command p *mapCopy.m_value@12 after executing DebugInfoManager::GetBoundariesAndVars PPS give lldb's disassemble command -A thumbv7 option to correct output

seanshpark commented 8 years ago

When I'm stepping through this code, after instruction at 0xb0082530 (beq 0xb008253c) thread state becomes invalid and control never reaches 0xb008253c (where first IL instruction starts in my case).

Same here. This is my log

(lldb) s
(lldb) Process 9688 stopped
* thread dotnet/coreclr#1: tid = 9688, 0x719fb530, name = 'corerun', stop reason = instruction step into
    frame #0: 0x719fb530
->  0x719fb530: .long  0xf649d004                ; unknown opcode
    0x719fb534: .long  0xf2c733e5                ; unknown opcode
    0x719fb538: .long  0x47986337                ; unknown opcode
    0x719fb53c: vmax.f32 d27, d2, d0
(lldb) dis -A thumbv7 -b
->  0x719fb530: 0xd004       beq    0x719fb53c
    0x719fb532: 0xf64933e5   movw   r3, #0x9be5
    0x719fb536: 0xf2c76337   movt   r3, #0x7637
    0x719fb53a: 0x4798       blx    r3
    0x719fb53c: 0xbf00       nop    
    0x719fb53e: 0xf2423358   movw   r3, #0x2358
    0x719fb542: 0xf2c73340   movt   r3, #0x7340
    0x719fb546: 0x681b       ldr    r3, [r3]
    0x719fb548: 0x9302       str    r3, [sp, #0x8]
    0x719fb54a: 0xf242335c   movw   r3, #0x235c
    0x719fb54e: 0xf2c70000   movt   r0, #0x7000
(lldb) s
(lldb) dis -A thumbv7 -b
error: Cannot disassemble around the current function without a selected frame.

BTW, is it possible to set default arch to thumbv7 in LLDB ?

mikem8361 commented 8 years ago

Sorry, I don’t know lldb that well especially around any arm or thumb specific support.

seanshpark commented 8 years ago

Without loading sosplugin, omitting plugin load libsosplugin.so line and then add break point to 0x719fb53c and 0x719fb510, adding two places for sure, instead of giving bpmd helloworld3.exe HelloWorld3.Program.show_message did not stop at any address. I think LLDB does not stop at JIT-ed code. Is this normal (a problem to use) or am I doing something wrong?

edited: I think I am doing something wrong here... edited#2: Tested again with adding breakpoint after JIT code was generated, break at DACNotifyCompilationFinished function, still no break in LLDB at JIT-ed code.

mikem8361 commented 8 years ago

1) There is a chance that the code was jitted at a different address. You could still load libsosplugin;.so and use the "name2ee helloworld3.exe!HelloWorld3.Program.show_message" to get the jitted address without using the bpmd command. 2) If the addresses are actually the jitted code, there might be a bug in LLDB. 3) FYI, it may have problems on arm because it hasn't been tested but the "clru " or "clru <code address" should disassemble the whole jitted method.

Dmitri-Botcharnikov commented 8 years ago

The bpmd command sets exception handler through LLDB API SBTarget::BreakpointCreateForException. This creates BP for symbols '__cxa_throw' and '__cxa_rethrow' in libstdc++.so. If I disable this BP the user program finishes successfully. Of course no notification for jitted code address:

(lldb) bpmd hello3.1.exe Program.show_message
Adding pending breakpoints...
(lldb) breakpoint list
Current breakpoints:
1: name = 'EEStartup', locations = 1, resolved = 1, hit count = 1
  1.1: where = libcoreclr.so`EEStartup(tagCOINITEE) at ceemain.cpp:1427, address = 0xb637474c, resolved, hit count = 1 

2: Exception breakpoint (catch: off throw: on) using: names = {'__cxa_throw', '__cxa_rethrow'}, locations = 2, resolved = 2, hit count = 0

  Names:
    DoNotDeleteOrDisable

  2.1: where = libstdc++.so.6`__cxa_throw, address = 0xb6ee7540, resolved, hit count = 0 
  2.2: where = libstdc++.so.6`__cxa_rethrow, address = 0xb6ee75c0, resolved, hit count = 0 

(lldb) breakpoint disable 2
1 breakpoints disabled.
(lldb) process continue
Process 23435 resuming
(lldb) Hello World CoreCLR!!!
Process 23435 exited with status = 0 (0x00000000) 

Setting lldb breakpoint on address from previous session doesn't work, but perhaps it's different. Also clru on jitted address crashes lldb.

mikem8361 commented 8 years ago

Sounds like lldb on ARM has some problems around breakpoints. Not sure what do about it.

seanshpark commented 8 years ago

When I'm stepping through this code, after instruction at 0xb0082530 (beq 0xb008253c) thread state becomes invalid and control never reaches 0xb008253c (where first IL instruction starts in my case). Could it be exception?

I'm still tracing, not knowing much about exception, but I think you are right. Reason of this may be related with wrong ARM register stacking somewhere like the ones fixed in recent PR aboutunwinding. cc @parjong , @myungjoo

chunseoklee commented 8 years ago

Please note that if you rerun, it does not work even on Ubuntu/x64 with llvm3.8 lldb3.8. Here is the log

twoflower@js1-desktop ~/d/c/b/P/Linux.x64.Debug>
~/dev/llvm_install/bin/lldb-3.8.1 -o "plugin load libsosplugin.so" -o "process launch -s" -o "setclrpath ./" -o "breakpoint set -n LoadLibraryExW" -o "c" -o "br del" -- corerun hello.exe
(lldb) target create "corerun"
Current executable set to 'corerun' (x86_64).
(lldb) settings set -- target.run-args  "hello.exe"
(lldb) plugin load libsosplugin.so
(lldb) process launch -s
Process 32476 launched: '/home/twoflower/dev/coreclr_upstream/bin/Product/Linux.x64.Debug/corerun' (x86_64)
(lldb) setclrpath ./
Set load path for sos/dac/dbi to './'
(lldb) breakpoint set -n LoadLibraryExW
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) c
1 location added to breakpoint 1
Process 32476 resuming
Process 32476 stopped
* thread dotnet/coreclr#1: tid = 32476, 0x00007ffff6650ed3 libcoreclr.so`::LoadLibraryExW(lpLibFileName=0x000000000069a0d8, hFile=0x0000000000000000, dwFlags=0) + 35 at module.cpp:214, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff6650ed3 libcoreclr.so`::LoadLibraryExW(lpLibFileName=0x000000000069a0d8, hFile=0x0000000000000000, dwFlags=0) + 35 at module.cpp:214
   211      IN /*Reserved*/ HANDLE hFile,
   212      IN DWORD dwFlags)
   213  {
-> 214      if (dwFlags != 0)
   215      {
   216          // UNIXTODO: Implement this!
   217          ASSERT("Needs Implementation!!!");

(lldb) br del
All breakpoints removed. (1 breakpoint)
(lldb) bpmd hello.exe Hello.Main
Adding pending breakpoints...
(lldb) c
Process 32476 resuming
(lldb) JITTED hello!Hello.Main()
Setting breakpoint: breakpoint set --address 0x00007FFF7CCE8498 [Hello.Main()]
Process 32476 stopped
* thread dotnet/coreclr#1: tid = 32476, 0x00007fff7cce8498, name = 'corerun', stop reason = breakpoint 3.1
    frame #0: 0x00007fff7cce8498
->  0x7fff7cce8498: nop
    0x7fff7cce8499: movabsq $0x7fff6c001068, %rdi     ; imm = 0x7FFF6C001068
    0x7fff7cce84a3: movq   (%rdi), %rdi
    0x7fff7cce84a6: callq  0x7fff7c49d1d0

Process 32476 resuming
(lldb) Hello, World!
Process 32476 exited with status = 0 (0x00000000)
br list
Current breakpoints:
2: Exception breakpoint (catch: off throw: on) using: names = {'__cxa_throw', '__cxa_rethrow'}, locations = 2, resolved = 2, hit count = 2

  Names:
    DoNotDeleteOrDisable

  2.1: where = libstdc++.so.6`__cxa_throw, address = 0x00007ffff77128b0, resolved, hit count = 2
  2.2: where = libstdc++.so.6`__cxa_rethrow, address = 0x00007ffff7712930, resolved, hit count = 0

3: address = 0x00007fff7cce8498, locations = 1, resolved = 1, hit count = 1
  3.1: address = 0x00007fff7cce8498, resolved, hit count = 1

(lldb) r
Process 32629 launched: '/home/twoflower/dev/coreclr_upstream/bin/Product/Linux.x64.Debug/corerun' (x86_64)
Hello, World!
(lldb) Process 32629 exited with status = 0 (0x00000000)
seanshpark commented 8 years ago

While looping around the JIT-ted code, 0x719fb53c for my case,

 0x719fb53c: 0xbf00       nop

if I stop the process with process interrupt and remove this breakpoint with break del (number) and then continue, the program ends normal. Seems like when reaches the break point, breakpoint handler(?) is responding weird. I'm looking inside lldb but not much luck.

Dmitri-Botcharnikov commented 8 years ago

The problem was in breakpoint opcode. If LLDB can't find from executable image which opcode (ARM or Thumb) to use, it uses ARM opcode. As far as all jitted code is thumb, wrong opcode causes such behavior. LLDB however checks bit 0 and if it is set, LLDB uses Thumb breakpoint opcode. The proposed PR dotnet/coreclr#5014 sets Thumb code bit for all breakpoint addresses in jitted code for Linux/ARM.

seanshpark commented 8 years ago

Fixed.