dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

LLDB Hang? #37405

Open xiangzhai opened 4 years ago

xiangzhai commented 4 years ago

Hi,

LLDB-8.0 is able to work for AMD64:

lldb -- ./bin/Product/Linux.x64.Debug/corerun /home/loongson/HelloWorld.dll

(lldb) target create "./bin/Product/Linux.x64.Debug/corerun"
Current executable set to './bin/Product/Linux.x64.Debug/corerun' (x86_64).
(lldb) settings set -- target.run-args  "/home/loongson/HelloWorld.dll"
(lldb) r
Process 52861 launched: '/home/loongson/coreclr-mips64-dev/bin/Product/Linux.x64.Debug/corerun' (x86_64)
2 tracked GC refs are at stack offsets -0040 ... FFFFFFD0
1 tracked GC refs are at stack offsets -0040 ... FFFFFFC8
1 tracked GC refs are at stack offsets -0038 ... FFFFFFD0
Hello World!
Process 52861 exited with status = 0 (0x00000000) 
(lldb) 

But LLDB-3.9 hang for ARM64:

lldb -- ./bin/Product/Linux.arm64.Debug/corerun /home/loongson/HelloWorld.dll

(lldb) target create "./bin/Product/Linux.arm64.Debug/corerun"
Current executable set to './bin/Product/Linux.arm64.Debug/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/loongson/HelloWorld.dll"
(lldb) r

diagnostics initial ported to MIPS64, but LLDB-8.0 is hang for MIPS64 either... GDB is able to work for ARM64 and MIPS64 https://github.com/dotnet/runtime/issues/606

Please give me some hints.

Thanks, Leslie Zhai

janvorli commented 4 years ago

LLDB is hanging on amd64 on Alpine too. And it works on ARM64 in newer versions just fine. I believe it is happening either due to wrong build options being specified when building LLDB or LLDB just being broken for MIPS64. (based on my experience with problems I was hitting when trying to build LLDB 8 and 10 for arm32). Where did you get your build of LLDB? Was it from a distro package or have you built it yourself?

xiangzhai commented 4 years ago

Hi @janvorli

Thanks for your kind response!

LLDB-3.9 for ARM64 is just apt-get install lldb-3.9.

And LLDB-3.9 even hang with simple HelloWorld:

lldb -- ./.aout 111

(lldb) target create "./a.out"
Current executable set to './a.out' (aarch64).
(lldb) settings set -- target.run-args  "111"
(lldb) r

LLDB-8.0 for MIPS64 is @QiaoVanke rpmbuild from upstream's source.

But LLDB-8.0 not hang with HelloWorld:

(lldb) target create "./a.out"
Current executable set to './a.out' (mips64r2el).
(lldb) settings set -- target.run-args  "111"
(lldb) r
Process 20375 launched: '/home/loongson/zhaixiang/a.out' (mips64r2el)
Hello World: 111
Process 20375 exited with status = 0 (0x00000000) 

Thanks, Leslie Zhai

janvorli commented 4 years ago

Ah, so I have misunderstood, I have thought that LLDB 8 is hanging for MIPS64 too. As for the arm64, what Linux distro are you using? E.g. for Ubuntu, there are newer versions available (Even for relatively old Ubuntu 16.04).

xiangzhai commented 4 years ago

Ah, so I have misunderstood, I have thought that LLDB 8 is hanging for MIPS64 too.

Sorry for my poor English! LLDB-8.0 MIPS not hang with simple HelloWorld, but hang with corerun:

(lldb) target create "./bin/Product/Linux.mips64.Debug/corerun"
Current executable set to './bin/Product/Linux.mips64.Debug/corerun' (mips64r2el).
(lldb) settings set -- target.run-args  "/home/loongson/zhaixiang/HelloWorld.dll"
(lldb) r
Process 20960 launched: '/home/loongson/zhaixiang/coreclr-mips64-dev/bin/Product/Linux.mips64.Debug/corerun' (mips64r2el)
janvorli commented 4 years ago

Hmm, this is strange, I've never seen a case when a C hello world would work, but .NET Core would hang under lldb.

When it looks hung - does ctrl+c have any effect?

xiangzhai commented 4 years ago

When it looks hung - does ctrl+c have any effect?

YES, Ctrl+C works:

Process 20960 exited with status = -1 (0xffffffff) lost connection
janvorli commented 4 years ago

Hmm, but it doesn't break into the running code but just terminates the process.

What if you set a breakpoint at "main" and run corerun, does it still hang or does it at least hit that?

xiangzhai commented 4 years ago

What if you set a breakpoint at "main" and run corerun, does it still hang or does it at least hit that?

At least hit that:

(lldb) target create "./bin/Product/Linux.mips64.Debug/corerun"
Current executable set to './bin/Product/Linux.mips64.Debug/corerun' (mips64r2el).
(lldb) settings set -- target.run-args  "/home/loongson/zhaixiang/Hello.dll"
(lldb) b main
Breakpoint 1: where = corerun`main + 52 at corerun.cpp:161:20, address = 0x0000000120003e94
(lldb) r
Process 28804 launched: '/home/loongson/zhaixiang/coreclr-mips64-dev/bin/Product/Linux.mips64.Debug/corerun' (mips64r2el)
Process 28804 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x0000000120003e94 corerun`main(argc=2, argv=0x000000ffffffb398) at corerun.cpp:161:20
   158
   159  int main(const int argc, const char* argv[])
   160  {
-> 161      return corerun(argc, argv);
   162  }
(lldb) c
Process 28804 resuming

But failed to break at CodeGen::genPushCalleeSavedRegisters:

(lldb) target create "./bin/Product/Linux.mips64.Debug/corerun"
Current executable set to './bin/Product/Linux.mips64.Debug/corerun' (mips64r2el).
(lldb) settings set -- target.run-args  "/home/loongson/zhaixiang/Hello.dll"
(lldb) b CodeGen::genPushCalleeSavedRegisters
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 29287 launched: '/home/loongson/zhaixiang/coreclr-mips64-dev/bin/Product/Linux.mips64.Debug/corerun' (mips64r2el)
janvorli commented 4 years ago

Hmm, so the only idea I have is that you could step through the code until you find the place where it hangs. I would try to set a breakpoint at coreclr_initialize, then step over the calls there as the first step.

xiangzhai commented 4 years ago

As for the arm64, what Linux distro are you using? E.g. for Ubuntu, there are newer versions available (Even for relatively old Ubuntu 16.04).

LLDB-6 is able to work for ARM64:

(lldb) target create "./bin/Product/Linux.arm64.Release/corerun"
Current executable set to './bin/Product/Linux.arm64.Release/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/loongson/zhaixiang/Hello.dll"
(lldb) r
Process 13580 launched: './bin/Product/Linux.arm64.Release/corerun' (aarch64)
Hello World!
Process 13580 exited with status = 0 (0x00000000) 
(lldb) 

GDB is able to work for MIPS64 gdb.log

LLDB-8 might hang at MethodDesc::JitCompileCode for MIPS64 lldb.log So set breakpoint at MethodDesc::JitCompileCode:

...
TID 5b01: In PreStubWorker for System.AppContext::Setup
TID 5b01: Prestubworker: method 000000FF7C9BF4C8M
TID 5b01:     In PrepareILBasedCode, calling JitCompileCode
Process 23297 stopped
* thread #1, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x000000fff67940e8 libcoreclr.so`MethodDesc::JitCompileCode(this=0x000000ff7c9bf4c8, pConfig=0x000000ffffff9c20) at prestub.cpp:686:5
   683  {
   684      STANDARD_VM_CONTRACT;
   685  
-> 686      LOG((LF_JIT, LL_INFO1000000,
   687          "JitCompileCode(" FMT_ADDR ", %s) for %s:%s\n",
   688          DBG_ADDR(this),
   689          IsILStub() ? " TRUE" : "FALSE",
(lldb) x/22i 0x000000fff67940e8
->  0xfff67940e8: 01 00 00 10  b      0x8
    0xfff67940ec: 00 00 00 00  nop    
    0xfff67940f0: d0 00 c1 df  ld     $1, 0xd0($fp)
    0xfff67940f4: 30 83 39 dc  ld     $25, -0x7cd0($1)
    0xfff67940f8: 25 e0 20 00  move   $gp, $1
    0xfff67940fc: 09 f8 20 03  jalr   $25
    0xfff6794100: 00 00 00 00  nop    
    0xfff6794104: 3d 00 40 10  beqz   $2, 0xf8
    0xfff6794108: 00 00 00 00  nop    
    0xfff679410c: 01 00 00 10  b      0x8
    0xfff6794110: 00 00 00 00  nop    
    0xfff6794114: b8 00 c1 df  ld     $1, 0xb8($fp)
    0xfff6794118: 3e 38 01 00  dsrl32 $7, $1, 0x0
    0xfff679411c: 03 f8 28 7c  dext   $8, $1, 0x0, 0x20
    0xfff6794120: d0 00 c2 df  ld     $2, 0xd0($fp)
    0xfff6794124: d0 d3 59 dc  ld     $25, -0x2c30($2)
    0xfff6794128: 25 20 20 00  move   $4, $1
    0xfff679412c: 25 e0 40 00  move   $gp, $2
    0xfff6794130: b0 00 c7 ff  sd     $7, 0xb0($fp)
    0xfff6794134: a8 00 c8 ff  sd     $8, 0xa8($fp)
    0xfff6794138: 09 f8 20 03  jalr   $25
    0xfff679413c: 00 00 00 00  nop    
(lldb) si
Process 23297 stopped
* thread #1, name = 'corerun', stop reason = instruction step into
    frame #0: 0x000000fff67940f0 libcoreclr.so`MethodDesc::JitCompileCode(this=0x000000ff7c9bf4c8, pConfig=0x000000ffffff9c20) at prestub.cpp:686:5
   683  {
   684      STANDARD_VM_CONTRACT;
   685  
-> 686      LOG((LF_JIT, LL_INFO1000000,
   687          "JitCompileCode(" FMT_ADDR ", %s) for %s:%s\n",
   688          DBG_ADDR(this),
   689          IsILStub() ? " TRUE" : "FALSE",
(lldb) x/22i 0x000000fff67940e8
    0xfff67940e8: 01 00 00 10  b      0x8
    0xfff67940ec: 00 00 00 00  nop    
->  0xfff67940f0: d0 00 c1 df  ld     $1, 0xd0($fp)
    0xfff67940f4: 30 83 39 dc  ld     $25, -0x7cd0($1)
    0xfff67940f8: 25 e0 20 00  move   $gp, $1
    0xfff67940fc: 09 f8 20 03  jalr   $25
    0xfff6794100: 00 00 00 00  nop    
    0xfff6794104: 3d 00 40 10  beqz   $2, 0xf8
    0xfff6794108: 00 00 00 00  nop    
    0xfff679410c: 01 00 00 10  b      0x8
    0xfff6794110: 00 00 00 00  nop    
    0xfff6794114: b8 00 c1 df  ld     $1, 0xb8($fp)
    0xfff6794118: 3e 38 01 00  dsrl32 $7, $1, 0x0
    0xfff679411c: 03 f8 28 7c  dext   $8, $1, 0x0, 0x20
    0xfff6794120: d0 00 c2 df  ld     $2, 0xd0($fp)
    0xfff6794124: d0 d3 59 dc  ld     $25, -0x2c30($2)
    0xfff6794128: 25 20 20 00  move   $4, $1
    0xfff679412c: 25 e0 40 00  move   $gp, $2
    0xfff6794130: b0 00 c7 ff  sd     $7, 0xb0($fp)
    0xfff6794134: a8 00 c8 ff  sd     $8, 0xa8($fp)
    0xfff6794138: 09 f8 20 03  jalr   $25
    0xfff679413c: 00 00 00 00  nop    
(lldb) si
(lldb) register read r1
      r1 = 0x000000fff79b1a40  

(lldb) s

si is able to work, but s hang...

Thanks, Leslie Zhai

janvorli commented 4 years ago

Hmm, I would try to use "ni" lldb command to step through the code here instruction by instruction (stepping over the calls) and see which call resulting from the macro expansion / calls to the functions IsILStub, GetMethodTable, GetDebugClassName is causing the hang. Then do it again, but step into the call that was hanging using "si" and then use the "s" until some call hangs again and then basically repeat these steps for the hung call. This way you should be able to get to the exact failure location.

However, I believe the problem is a bug in lldb, so another thing you could do is to build lldb with debugging symbols enabled (pass the cmake -DCMAKE_BUILD_TYPE=Debug option instead of -DCMAKE_BUILD_TYPE=Release), use it to run the app and when it hangs, use GDB to attach to the LLDB process and see if you can identify why it hung.

xiangzhai commented 4 years ago

However, I believe the problem is a bug in lldb

Agree +1

But we might switch to use GNU toolchain after migrated to 5.x (the master) branch https://github.com/dotnet/coreclr/pull/27625 Let's open source CoreCLR 3.x MIPS64 port ASAP and fix GC issue :)

xiangzhai commented 4 years ago

\cc @heiher