lldb step over failed in musl dyn exe (Could not create return address breakpoint)

Quuxplusone commented 3 years ago


Bugzilla Link	PR50935
Status	NEW
Importance	P enhancement
Reported by	Ryan Prichard (rprichard@google.com)
Reported on	2021-06-29 16:24:59 -0700
Last modified on	2021-08-12 17:05:34 -0700
Version	12.0
Hardware	PC Linux
CC	jdevlieghere@apple.com, jingham@apple.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments	`apple-step-into-example.txt` (7970 bytes, text/plain) `lldb-step-off-breakpoint.txt` (5303 bytes, text/plain)
Blocks
Blocked by
See also

With a dynamically-linked musl executable (on x86_64 gLinux aka Debian), LLDB
is unable to step over a "printf" call. The problem happens with either PIE or
non-PIE executables, but not with statically-linked musl executables. LLDB is
able to step over this call if I use glibc instead (with either the Debian gcc
or clang compiler).

I tested with both LLDB 12.0.0 and with a current LLDB build from origin/main.

$ cat >test.c <<EOF
#include <stdio.h>
int main(void) {
  printf("hello!\n");
  return 0;
}
EOF

$ musl-gcc test.c -g
$ file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically
linked, interpreter /lib/ld-musl-x86_64.so.1, with debug_info, not stripped

$ /x/clang12/bin/lldb a.out
(lldb) target create "a.out"
Current executable set to '/x/mess/a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 4 at test.c:3:3, address = 0x0000000000001159
(lldb) run
Process 235162 launched: '/x/mess/a.out' (x86_64)
Process 235162 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000555555555159 a.out`main at test.c:3:3
   1    #include <stdio.h>
   2    int main(void) {
-> 3      printf("hello!\n");
   4      return 0;
   5    }
(lldb) n
Process 235162 stopped
* thread #1, name = 'a.out', stop reason = step over failed (Could not create
return address breakpoint.)
    frame #0: 0x0000555555555020 a.out`puts
a.out`puts:
->  0x555555555020 <+0>:  jmpq   *0x2ff2(%rip)             ;
_GLOBAL_OFFSET_TABLE_ + 24
    0x555555555026 <+6>:  pushq  $0x0
    0x55555555502b <+11>: jmp    0x555555555010

a.out`__libc_start_main:
    0x555555555030 <+0>:  jmpq   *0x2fea(%rip)             ; _GLOBAL_OFFSET_TABLE_ + 32

$ /x/clang12/bin/lldb --version
lldb version 12.0.0 (https://github.com/llvm/llvm-project/ revision
b978a93635b584db380274d7c8963c73989944a1)
  clang revision b978a93635b584db380274d7c8963c73989944a1
  llvm revision b978a93635b584db380274d7c8963c73989944a1

$ musl-gcc --version
cc (Debian 10.2.1-6+build2) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ dpkg -l musl
...
ii  musl:amd64     1.2.2-1      amd64        standard C library

Quuxplusone commented 3 years ago

I see three stages of the process that seem not to be working on your system.

1) When lldb is "nexting over a function" and sees that it is at a "call" instruction (i.e. one that is guaranteed to return to the next instruction), it won't try to step in and then step back out of the function, it will instead put a breakpoint on the instruction after the call and run to there.

We use the return from InstructionLLVMC::IsCall for the stub dispatch instruction to decide whether to take this shortcut or not. If that were returning true, we wouldn't have stepped into printf and had to step back out again. Of course if you are using a more general instruction to dispatch to the cross-library stub that isn't a call then this is not relevant. We will have to step in and then back out to follow the control flow.

2) When we do a step-in and land in shared library trampoline code, we usually step "through" the trampoline to the target function automatically. That doesn't seem to be working, since we stop in the stub in your a.out, not in the target of the stub. Stepping through cross-library stubs is handled by the GetStepThroughTrampolinePlan call in the DynamicLoader plugin for your system.

It looks like that isn't working in your case. Note, if your dynamic loader behavior is different from the standard Linux loader behavior, you will have to write a DynamicLoader plugin that describes how it works..

3) Finally, when we try to step out of a frame, we ask the unwinder for the return pc from the frame above us. It looks like either we don't know how to unwind from the cross-library stub, and so we can't find the previous frame or we can't find the return PC in that frame. There's something about being stopped at a cross-library stub that's fooling the unwinder.

These are all failures specific to support for your system, it's the plugin-points that are failing so far as I can see, not the basic algorithms. Somebody who has access to this system and can debug these problems will need to have a look.

Quuxplusone commented 3 years ago

When I use musl-gcc, LLDB isn't able to unwind either from the executable's
.plt or from musl's libc.so, whereas that does work if I use the gcc driver
(for using glibc). At least for the executable's .plt, LLDB identifies two ways
to unwind, "assembly insn profiling" and "eh_frame CFI", but of course it
doesn't trust eh_frame to be valid at every PC, so maybe that's affecting
something. I need to debug it some more.

Even on macOS, though, I see the same behavior where LLDB steps over a call by
stepping into it and back out. I'll attach an example.

I'm suspicious of a couple of comparisons in LLDB's
ThreadPlanStepRange::SetNextBranchBreakpoint that *seem* off-by-two and off-by-
one:

https://github.com/llvm/llvm-project/blob/llvmorg-13.0.0-rc1/lldb/source/Target/ThreadPlanStepRange.cpp#L334-L347

  // If we didn't find a branch, run to the end of the range.
  if (branch_index == UINT32_MAX) {
    uint32_t last_index = instructions->GetSize() - 1;
    if (last_index - pc_index > 1) {
      ...
    }
  } else if (branch_index - pc_index > 1) {

The (last_index - pc_index > 1) comparison requires that the list of
instructions have at least 3 remaining instructions to execute, whereas
intuitively I'd expect 1 instruction to be enough? i.e. The if statement can be
removed.

The (branch_index - pc_index > 1) comparison requires that there be at least 2
instructions between the PC and the branch instruction, but intuitively 1 ought
to be enough. The comparison could become (branch_index > pc_index).

e.g. From my attached apple-step-into-example.txt, the assembly is:

(lldb) disas
a.out`main:
    0x100003f7a <+0>:  pushq  %rbp
    0x100003f7b <+1>:  movq   %rsp, %rbp
->  0x100003f7e <+4>:  leaq   0x29(%rip), %rdi          ; "hello!"
    0x100003f85 <+11>: callq  0x100003f8e               ; symbol stub for: puts
    0x100003f8a <+16>: xorl   %eax, %eax
    0x100003f8c <+18>: popq   %rbp
    0x100003f8d <+19>: retq

I *think* the list of instructions would have the leaq and callq instructions,
pc_index would be 0, and there is no branch at the end. last_index is 1
(referring to the callq instruction). We want to set a breakpoint on the xorl
instruction and run to it, but ThreadPlanStepRange::SetNextBranchBreakpoint
doesn't set a breakpoint, and LLDB seems to single-step one instruction at-a-
time. (I'm not sure how that works yet...)

$ cat apple-step-into-example.txt | grep reached
 ThreadPlanStepOverRange reached 0x0000000100003f85.
 ThreadPlanStepOverRange reached 0x0000000100003f8e.
 ThreadPlanStepOverRange reached 0x0000000100003f8a.

This commit is relevant:

https://github.com/llvm/llvm-project/commit/a3f466b9e785ca8f6712904e408bda31c79ca1b0

"Fix commit 252963 to work around a bug on some platforms where they don't
correctly handle stepping over one breakpoint directly onto another breakpoint.
This isn't fixing that bug, but rather just changing 252963 to not use
breakpoints
if it is only stepping one instruction."

Stepping one instruction isn't generally sufficient to reach the branch (or the
end of instruction list), though:

 - If branch_index is UINT32_MAX: pc_index could point to the last instruction, which is a call that we want to step over. In that case, the breakpoint would be one instruction further (but multiple instructions run).

 - If branch_index is a real branch: pc_index could point to a call instruction just before the branch.

I experimented with the above changes to
ThreadPlanStepRange::SetNextBranchBreakpoint, and it fixes this bug, but
reveals another with musl-gcc:
 - Set a breakpoint on a call instruction to printf (or puts).
 - Run to the breakpoint.
 - Try to step over (n) the call instruction.
 - The program doesn't stop until it exits.

I think(?) LLDB is required to single-step off the breakpoint, which puts the
PC in the printf linker stub. From there, I guess LLDB can't unwind so it
doesn't know what to do? But in principle, it could still work, because it
could just run to the address past the call. That doesn't happen, though. I'll
attach it in case it's interesting, lldb-step-off-breakpoint.txt. That file has
a few interesting lines:

(lldb) disas
a.out`main:
    0x555555555155 <+0>:  pushq  %rbp
    0x555555555156 <+1>:  movq   %rsp, %rbp
    0x555555555159 <+4>:  leaq   0xea0(%rip), %rdi
->  0x555555555160 <+11>: callq  0x555555555020            ; symbol stub for:
___lldb_unnamed_symbol61
    0x555555555165 <+16>: movl   $0x0, %eax
    0x55555555516a <+21>: popq   %rbp
    0x55555555516b <+22>: retq
(lldb) log enable lldb step
(lldb) n
lldb             Thread::PushPlan(0x0x1103bc0): "Stepping over line
hello.c:4:3.", tid = 0x187bf.
lldb             ThreadPlanStepRange::SetNextBranchBreakpoint - Setting
breakpoint -2 (site 4) to run to address 0x555555555165
lldb             Process::PrivateResume() m_stop_id = 5, public state: stopped
private state: stopped
lldb             Thread::PushPlan(0x0x1103bc0): "Single stepping past
breakpoint site 3 at 0x555555555160", tid = 0x187bf.
...
intern-state     ThreadPlanStepOverRange reached 0x0000555555555020.
intern-state     Removing next branch breakpoint: -2.
intern-state     Stepping out of frame with no debug info
...

Quuxplusone commented 3 years ago

Attached apple-step-into-example.txt (7970 bytes, text/plain): apple-step-into-example.txt

Quuxplusone commented 3 years ago

Attached lldb-step-off-breakpoint.txt (5303 bytes, text/plain): lldb-step-off-breakpoint.txt

Quuxplusone commented 3 years ago

If you have a fix for the logic to make us step past the call instruction rather than stepping in, please put that up for review. That seems useful, though TTTT the optimization of "not doing step-in & step-out" is a best effort thing. We still expect that as a last resort step-in and step-back-out should work.

In the "step-off-breakpoint" error, stepping here is not going to go well if we can't unwind from the stub. We can't really do anything sensible, so we should just stop. The bug here is that, after the step over line plan was popped off the stack, we end up saying:

intern-state ThreadList::ShouldStop overall should_stop = 0

and then continuing.

Since the ThreadPlanStepRange plan was done & popped off the stack, we would next ask the base thread plan whether to stop in: ThreadPlanBase::ShouldStop. That's a moderately complex function and there's not enough info in your log to guess why it is deciding to resume. Somebody will need to debug that live.

Quuxplusone commented 3 years ago

We've talked at times about having a different "I'm lost in a new stack frame" response than to just stop. We could, for instance, just keep single stepping and hope we get somewhere better. Maybe single step till the function above us shows up again in the backtrace, and then try to set our return breakpoint.

But for right now the simplest thing is "if confused, stop!"

Quuxplusone commented 3 years ago

Yeah, I can upload my patch that allows for running to a breakpoint in more situations.

In the "step-off-breakpoint" error, stepping here is not going to go well if we can't unwind from the stub.

In this case, LLDB sets a breakpoint after the call instruction, then steps into the called function. It seems(?) like it could simply leave the breakpoint in place and resume, rather than clear the post-call breakpoint and try to unwind (which if successful would simply put a breakpoint at the same post-call PC).

I'm still unfamiliar with the "plan" stuff -- I'll probably study it a bit more.

I debugged unwinding a bit. LLDB is able to unwind from the executable .plt's back to main, and then from main into libc_start_main_stage2. However, I think it doesn't find unwind info for libc_start_main_stage2, so it can't establish a certain CFA value, then it discards both the frame for main and libc_start_main_stage2. I need to study it more closely, but I think LLDB could do better. Maybe musl needs an "end of the stack" annotation somehow?

I also noticed that SectionLoadList::ResolveLoadAddress is unable to lookup an address in musl's libc.so (e.g. "disas -n printf" works but "disas -a " doesn't). I don't think it's quite related, so I'll file a separate bug for it. I suspect the "TODO: remove this once we either fix library matching or avoid" in DynamicLoaderPOSIXDYLD.cpp special case is broken.

Quuxplusone commented 3 years ago

(In reply to Ryan Prichard from comment #7)
> Yeah, I can upload my patch that allows for running to a breakpoint in more
> situations.
>
> > In the "step-off-breakpoint" error, stepping here is not going to go well
if we can't unwind from the stub.
>
> In this case, LLDB sets a breakpoint after the call instruction, then steps
> into the called function. It seems(?) like it could simply leave the
> breakpoint in place and resume, rather than clear the post-call breakpoint
> and try to unwind (which if successful would simply put a breakpoint at the
> same post-call PC).

That's quite likely.  The optimization to treat calls specially is relatively
new.  For the longest time, we always ran to the next "branch" instruction no
matter the kind, and then did the "step in/step out" trick. We made no
assumptions about the character of the branch we were at.  The special handling
of "call" instructions was recent.  BTW, by "call" instruction, we just mean an
instruction that is guaranteed (except for exceptions) to return to the next
instruction following it.

So it's not surprising that this opportunity to simplify the algorithm hasn't
been taken yet.  Again, if you feel like digging in, feel encouraged to do so,
and I'll happily review any patches.

My plate is pretty full, so left to me, this may take a while.

>
> I'm still unfamiliar with the "plan" stuff -- I'll probably study it a bit
> more.
>
> I debugged unwinding a bit. LLDB is able to unwind from the executable
> .plt's back to main, and then from main into libc_start_main_stage2.
> However, I think it doesn't find unwind info for libc_start_main_stage2, so
> it can't establish a certain CFA value, then it discards both the frame for
> main and libc_start_main_stage2. I need to study it more closely, but I
> think LLDB could do better. Maybe musl needs an "end of the stack"
> annotation somehow?

Maybe a separate bug about this might be a good way to go.  I'm not an expert
on the unwinder, and this bug is starting to become a portmanteau bug...

>
> I also noticed that SectionLoadList::ResolveLoadAddress is unable to lookup
> an address in musl's libc.so (e.g. "disas -n printf" works but "disas -a
> <printf-addr>" doesn't). I don't think it's quite related, so I'll file a
> separate bug for it. I suspect the "TODO: remove this once we either fix
> library matching or avoid" in DynamicLoaderPOSIXDYLD.cpp special case is
> broken.

Thanks!

Quuxplusone commented 3 years ago

Yeah, I may have some time to dig into this more.

> I also noticed that SectionLoadList::ResolveLoadAddress is unable to lookup
> an address in musl's libc.so [...]

Filed as llvm.org/PR51466.

Quuxplusone / LLVMBugzillaTest

lldb step over failed in musl dyn exe (Could not create return address breakpoint) #49904