Quuxplusone commented 3 years ago


Bugzilla Link	PR49941
Status	NEW
Importance	P normal
Reported by	Jade Fink (llvm@jade.fyi)
Reported on	2021-04-12 16:04:34 -0700
Last modified on	2021-04-22 03:13:11 -0700
Version	unspecified
Hardware	PC Linux
CC	clayborg@gmail.com, jdevlieghere@apple.com, jingham@apple.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments	`riscv32_linux_gnu_target_defintion.py` (6695 bytes, text/x-python-script)
Blocks
Blocked by
See also

I've been trying out lldb as a replacement for gdb for working on my operating system, which I develop on qemu with a gdbstub. Part of this involves debugging code that is run before the initialization of the stack, in early startup code and exception handlers.

Here is an example thing that can go wrong:

(lldb) register write t0 123
error: invalid frame

The register read command is also affected by this issue. It appears that the cause of it is that a bunch of the commands in CommandObjectRegister.cpp are defined as eCommandRequiresFrame when they should handle there not being a frame present with slightly degraded functionality. It's not that they don't require a frame at all though: ExecutionContext::GetRegisterContext needs a frame to work, but it should be skipped if one is unavailable, which it is not currently.

Quuxplusone commented 3 years ago

I'm not sure I understand your problem.

Presumably by the time you stop the target program it is either executing some code or about to execute some code, and you can determine the register state it will adopt when it goes to do that.

What keeps you from presenting that something to lldb as frame 0 of some thread of execution?

Quuxplusone commented 3 years ago

The specific trouble I have is that if I want to read registers when the system first boots (before any code has been run at all), there is no way of doing that, that I know of, without apparently needing to write some Python as I just found out. It would be useful to have a fallback mode that dumps the registers without needing to have a frame recognizer.

Quuxplusone commented 3 years ago

What is the meaning of the registers you want to present?

Quuxplusone commented 3 years ago

(In reply to Jim Ingham from comment #3)

What is the meaning of the registers you want to present?

I want to find the initial state of the user registers (i.e. x1-x31) and more generally, understand the initialization process of the virtual system. So I want to step through the initialization code instruction by instruction [1]. The mostly undocumented initialization code is dynamically generated by qemu at runtime, so I don't have any module name or executable file for it. After messing with it and reading source for too long, I am still unsure if it is even possible to define a frame recognizer to tell lldb I have a frame for this code.

I have tried reading the source code, in particular, https://reviews.llvm.org/D44603, to try to understand what is even meant by the module name argument to frame recognizer add, but even doing:

frame recognizer add -l mu.NullFrameRecognizer -x -s .* -x -n .*

with the following frame recognizer:

class NullFrameRecognizer: def get_recognized_arguments(self, frame): return []

does not get anything from frame info 0.



If you wish to try this task out for yourself, try:

    qemu-system-riscv64 -machine virt -m 128M -smp 1 -nographic -s -S

then, from another window, with `lldb -a riscv64`,

    (lldb) gdb-remote 1234

Try to disassemble this startup code that the CPU has stopped at: `disassemble --pc` doesn't work because there's no frame, register read, likewise. I can disassemble it by address, but that's not helpful, because the only reason I know the program counter address is because I've used gdb extensively on this target before. I don't know how I can get the initial register state to be able to understand the code either, given `register read` doesn't work :)

This is trivial in gdb, it will tell you the program counter as soon as you connect, and `info registers` does not require a frame.

[1]: Stepping through it is also broken for me, `si` appears to be essentially continuing instead of stepping one instruction; unsure if this is related to not having frames and I am probably going to file a bug for this as well soon

Quuxplusone commented 3 years ago

How are you connecting to the qemu stub?

If you connect to a standard gdb-remote stub with something like the gdb-remote command in lldb, then the ProcessGDBRemote plugin will query for the thread info (for qemu that's usually one thread per executing virtual core), and among other things make a 0th frame to hold the register information the stub told it for that thread. That should all just happen automatically.

What is different in your setup that that is not happening?

Anyway, if you want to attach to a system and use lldb to represent threads of execution that are not currently active on any of the system cores, the lldb construct for that is the "OperatingSystem Plugin". That is a way of making "Memory backed" threads either to overlay ones already running on the core, or to make up new ones.

The "frame recognizer" is for re-interpreting a real frame, not for making one up out of whole cloth. It allows us to do things like: when stopped at a known function - but one without debug information - lldb can dig the argument values out of registers, cast those values to their known types, and fill the result of "frame var" with these reinterpreted values.

That's not what you want to do - you don't have a frame to begin with, and you aren't triggering off of the function that occupies a particular frame.

Quuxplusone commented 3 years ago

(In reply to Jim Ingham from comment #5)

How are you connecting to the qemu stub?

If you connect to a standard gdb-remote stub with something like the gdb-remote command in lldb, then the ProcessGDBRemote plugin will query for the thread info (for qemu that's usually one thread per executing virtual core), and among other things make a 0th frame to hold the register information the stub told it for that thread. That should all just happen automatically.

What is different in your setup that that is not happening?

Anyway, if you want to attach to a system and use lldb to represent threads of execution that are not currently active on any of the system cores, the lldb construct for that is the "OperatingSystem Plugin". That is a way of making "Memory backed" threads either to overlay ones already running on the core, or to make up new ones.

The "frame recognizer" is for re-interpreting a real frame, not for making one up out of whole cloth. It allows us to do things like: when stopped at a known function - but one without debug information - lldb can dig the argument values out of registers, cast those values to their known types, and fill the result of "frame var" with these reinterpreted values.

That's not what you want to do - you don't have a frame to begin with, and you aren't triggering off of the function that occupies a particular frame.

Noted, I will probably want to write a debugger plugin for the OS work I'm doing, so thanks for the keywords.

Here is the reproduction, if you want to try it:

In one terminal: $ qemu-system-riscv64 -machine virt -m 128M -smp 1 -nographic -s -S

In another: $ lldb -a riscv64 (lldb) gdb-remote 1234 Process 1 stopped

thread #1, stop reason = signal SIGTRAP frame #0: 0xffffffffffffffff (lldb) register read error: invalid frame

Reproducible on the latest llvmorg-13-init-7035-g27dfcd978edc with qemu 5.2.0 (and a <2 week old git version) on linux.

As for a gdb (11.0.50.20210122-git) session, it does indeed seem to be getting a frame 0, which lldb does not:

(gdb) set architecture riscv The target architecture is set to "riscv". (gdb) set disassemble-next-line on (gdb) target remote :1234 Remote debugging using :1234 warning: No executable has been specified and target does not support determining executable automatically. Try using the "file" command. 0x0000000000001000 in ?? () => 0x0000000000001000: 97 02 00 00 auipc t0,0x0 (gdb) bt

0 0x0000000000001000 in ?? ()

Quuxplusone commented 3 years ago

Alright, an update. What's happening here is that something that, according to
a comment, is not supposed to happen, is in fact happening.

          // There shouldn't be any way not to get the frame info for frame
          // 0. But if the unwinder can't make one, lets make one by hand
          // with the SP as the CFA and see if that gets any further.
          if (!success) {
            cfa = reg_ctx_sp->GetSP();
            pc = reg_ctx_sp->GetPC();
          }

Here, the call to GetSP() returns LLDB_INVALID_ADDRESS, because it fetched a
garbage register number. Why did it do that? It didn't have a register with
eRegisterKindGeneric and kind LLDB_REGNUM_GENERIC_SP. Oops.

Why'd that happen? Well, qemu was giving us xml that doesn't define any generic
attribute for any of the registers (see ProcessGDBRemote.cpp:4400). Is it
qemu's fault? Nope, they have the latest version of these target xml files from
gdb (see gdb/features/riscv/64bit-cpu.xml in the gdb tree). Is it gdb's fault?
Not really, it seems like this "generic" attribute is of lldb's invention to
begin with, and none of their target definitions have these attributes!

I believe that given this information I probably should go write a Python file
that defines the target properly. I can't fix it upstream as I doubt they would
accept a patch adding stuff that only lldb can use (it's kind of a moot point
anyway, it is not physically safe for me to contribute to anything under the
GNU project).

Quuxplusone commented 3 years ago

There should always be a frame zero, even if there is no stack. Maybe the bug here is somehow we are failing to create a frame zero? Frame zero just gets the stock register context with the actual register values. All other frames (1 and up) must actually unwind the register values.

Quuxplusone commented 3 years ago

Are you using a custom GDB server that is built into your OS? It might help to
see a packet log of what is going on and I might be able to tell more why
things are failing.

If you are using a GDB server that isn't lldb-server, then LLDB needs more
information about your registers. LLDB discovers the registers in
ProcessGDBRemote.cpp in the function:

  void ProcessGDBRemote::BuildDynamicRegisterInfo(bool force);

In there it will check in the following order
- check if there is a target definition file specified in the LLDB settings and
use this file which defines all registers
- read the target.xml from the GDB server
- send a series of qRegisterInfo packets (LLDB extension packet)

Does your GDB server support the "target.xml" so it can tell LLDB about the
registers that it has? I would guess the answer is yes. If this is the case,
then LLDB still needs extra information about the registers:
- compiler register number for reach register (register numbers used in
.eh_frame for example)
- DWARF register numbers for each register (usually the same as compiler reg
numbers, but these sometimes differ)
- LLDB generic information (which register is PC, SP, FP, etc)

When we fetch registers via the target.xml packet, we can use the
"lldb_private::ABI" plug-ins to augment and provide the missing register
information via:

  virtual void ABI::AugmentRegisterInfo(RegisterInfo &info) = 0;

We will first fill in the lldb_private::RegisterInfo struct from lldb-private-
types.h with the info retrieved via XML, and the ABI plug-in is used to set all
of the other LLDB specific info (compiler reg #, DWARF reg #, generic reg #,
etc).

I would guess that this is the reason this is all failing.

Quuxplusone commented 3 years ago

a GDB remote packet log can be exported using:

(lldb) log enable -f /tmp/packets.txt gdb-remote packets

then run your debug session, quit, and attach the file.

Quuxplusone commented 3 years ago

There is a patch in progress that adds better support for RISCV (32 and 64), though I don't think it actually was checked in yet. I will try and find it and post the link. We had some people doing RISCV work that were able to get LLDB working by applying this patch. It adds the necessary ABI plug-in and it fixes the register info and then things worked.

Quuxplusone commented 3 years ago

Found it. Try applying this and see if this fixes your issues:

https://reviews.llvm.org/D62732

Quuxplusone commented 3 years ago

If you still have issues, then please attach a packet log from the debug session using the command I supplied.

Quuxplusone commented 3 years ago

(In reply to Greg Clayton from comment #12)
> Found it. Try applying this and see if this fixes your issues:
>
> https://reviews.llvm.org/D62732

Oh my. I should have gone looking. I'll try that one next week when I have more
time.

Below is a description of the (imo) unreasonably hard to debug failure that
does happen when there's no generic registers registered because the plugin
isn't present:

(In reply to Greg Clayton from comment #8)
> There should always be a frame zero, even if there is no stack. Maybe the
> bug here is somehow we are failing to create a frame zero? Frame zero just
> gets the stock register context with the actual register values. All other
> frames (1 and up) must actually unwind the register values.

Yes, that's correct. There is a frame 0, it has cfa_is_valid true, but the
actual cfa and pc addresses are all ff because it fails to get either one and
instead set them to LLDB_INVALID_ADDRESS, which is all ones.

The reason it fails to get the stack frame is because lldb doesn't have generic
register info for riscv64 (as confirmed by calling Dump() with a debugger
attached to lldb).

We thus have the invalid frame error. This also explains why the unwind is
failing.

We really need to improve the error reporting for this, though. Maybe a verbose
log line saying that it couldn't find the generic registers and that an ABI
plugin/other target info needs to be written? I'm not familiar with the logging
in lldb or how we could usefully surface this information.

This will keep happening to embedded folks working on new odd targets and
getting this confusing pattern of errors, so I really think it needs to be
explicitly reported. I may come up with a patch. What behaviour would be
acceptable? Quitting the debugger on connection? Something else?

Quuxplusone commented 3 years ago

(In reply to Jade Fink from comment #14)
> (In reply to Greg Clayton from comment #12)
> > Found it. Try applying this and see if this fixes your issues:
> >
> > https://reviews.llvm.org/D62732
>
> Oh my. I should have gone looking. I'll try that one next week when I have
> more time.
>
> Below is a description of the (imo) unreasonably hard to debug failure that
> does happen when there's no generic registers registered because the plugin
> isn't present:
>
> (In reply to Greg Clayton from comment #8)
> > There should always be a frame zero, even if there is no stack. Maybe the
> > bug here is somehow we are failing to create a frame zero? Frame zero just
> > gets the stock register context with the actual register values. All other
> > frames (1 and up) must actually unwind the register values.
>
> Yes, that's correct. There is a frame 0, it has cfa_is_valid true, but the
> actual cfa and pc addresses are all ff because it fails to get either one
> and instead set them to LLDB_INVALID_ADDRESS, which is all ones.
>
> The reason it fails to get the stack frame is because lldb doesn't have
> generic register info for riscv64 (as confirmed by calling Dump() with a
> debugger attached to lldb).

Indeed!

>
> We thus have the invalid frame error. This also explains why the unwind is
> failing.
>
> We really need to improve the error reporting for this, though. Maybe a
> verbose log line saying that it couldn't find the generic registers and that
> an ABI plugin/other target info needs to be written? I'm not familiar with
> the logging in lldb or how we could usefully surface this information.

We definitely need to improve this.

>
> This will keep happening to embedded folks working on new odd targets and
> getting this confusing pattern of errors, so I really think it needs to be
> explicitly reported. I may come up with a patch. What behaviour would be
> acceptable? Quitting the debugger on connection? Something else?

I think we should give a much better error and explanation of what we can do to
fix the issue. For this to be correct we will probably need a new virtual
function on the lldb_private::Process class that gets called when no compiler,
DWARF or generic register information is set.

The ProcessGDBRemote version of this function can first check if there are any
registers at all.

If there are no registers, then the GDB server might not support the
"target.xml". In that case we can tell users they must create a target
definition file, and point to examples that are checked into LLDB already.
These are python files that can define everything that is needed for a
debugging target that uses a GDB server that doesn't have dynamic register
info. We have many examples in our sources:

lldb/examples/python/*_target_definition.py

Users would make a new target definition file and specify it using the
"settings set" command:

(lldb) settings set plugin.process.gdb-remote.target-definition-file
/path/to/riscv_target_definition.py

If there are registers, we should check if any of the register information has
compiler, dwarf and generic information. If there is none, we can suggest to
use the target definition files as above, or let them know they can add a
lldb_private::ABI plug-in implementation by modifying LLDB code.

The other easier solution is to detect the registers in ProcessGDBRemote and
then immediately following that, check to verify that the compiler, DWARF and
generic registers are set for the important registers: PC, SP, FP (if there is
one for the architecture). If not, dump out some text as defined above and kill
the session?

I am open to any better suggestions.

Quuxplusone commented 3 years ago

We might also need to modify some code in LLDB in case we do the minimum to get registers working for LLDB like PC, SP and FP, and then we have some DWARF that says a variable is in DWARF register 123 and we have no register that matches. This will at least warn users some other data is missing.

If we at least get the important registers PC, SP and FP, we will at least be able to create a frame zero and see registers.

Quuxplusone commented 3 years ago

Attached riscv32_linux_gnu_target_defintion.py (6695 bytes, text/x-python-script): Example target definition file for riscv32

Quuxplusone commented 3 years ago

We had gotten things working with our riscv 32 emulator using the attached target definition file. You will need to ensure all of the register offsets and sizes are correct by dumping the info in GDB using "maint print raw-registers" or one of the maint print commands.

Quuxplusone commented 3 years ago

(In reply to Greg Clayton from comment #12)
> Found it. Try applying this and see if this fixes your issues:
>
> https://reviews.llvm.org/D62732

Confirmed this works. But I will work on a patch to make the error better after
I'm done finals.

Quuxplusone / LLVMBugzillaTest

register commands require a frame unconditionally #48910

0 0x0000000000001000 in ?? ()