Fix hardware single step

icecream95 commented 2 years ago

Some gdb versions require the vContSupported+ feature to be exposed to use hardware single stepping; add it to the "qSupported" response string.

I think that the reason that I noticed this is a combination of the following:

I'm on an AArch64 host machine, so using the standard Debian GDB build for an AArch64 target rather than a special build that might not have this check
The MCU is behind an MMU (in addition to the MPU), which seems to silently prevent overwriting instructions for software breakpoints

Note that I have not tested the updated unit tests—it's possible that I didn't update them properly.

Thank you for this project; I'm having a lot of fun debugging GPU firmware using a couple of tracebuffers in shared memory between the CPU and GPU for communication. It's pretty powerful for an MCU—a Cortex-M7 which can run at 1 GHz and use at least 1 GB of RAM (and through a mechanism I have not yet determined read and write to 16 different 48-bit virtual address spaces). The rest of the GPU is almost unnecessary :-)

adamgreen commented 2 years ago

Thanks for the PR.

I went back through my notes to see if I had anything about why I didn't add this back when I started supporting vCont in the first place. The only note I have is that GDB always seemed to probe for "vCont?" support, no matter what was in my qSupported response. Maybe that has changed.

I will merge in your patch on my machine this week and dogfood it for a bit to make sure that it doesn't cause me any problems with the version of GDB that I am running. You are probably running a newer version of GDB on your machine.

icecream95 commented 2 years ago

So it seems that the difference with my build of GDB is that it is configured with a default OS ABI of "GNU/Linux":

(gdb) show osabi
The current OS ABI is "auto" (currently "GNU/Linux").
The default OS ABI is "GNU/Linux".

If I add -iex 'set osabi none' to the gdb command line, then it will use the generic ARM target dependent code, rather than Linux-specific code, and single stepping works without doing anything special.

However, this only works before loading the binary in GDB (which is why I used -iex rather than -ex), so while it is possible to include the ABI in the XML target description, it is parsed too late to be of any use. So the only fix is to specify it on the command line, or in a .gdbinit file.

I guess some ways to fix this issue are:

Document that only the "none" OSABI is supported, and describe how to tell gdb to use it.
Try to support the Linux OSABI well enough to be usable. As well as merging this PR, that would also involve making udf.n #1 and udf.w #0 work as breakpoint instructions.

adamgreen commented 2 years ago

I have merged in your vContSupported+ change. Thanks again for that!

Try to support the Linux OSABI well enough to be usable. As well as merging this PR, that would also involve making udf.n #1 and udf.w #0 work as breakpoint instructions.

What currently happens when that version of GDB uses the udf, undefined instruction, instead of the bkpt instruction? Does it look more like a crash and less like a breakpoint being encountered? Can you still continue as expected after or does GDB fail to step over that instruction before re-enabling the software breakpoint and continuing execution?

icecream95 commented 2 years ago

udf breakpoints mostly work well enough, apart from the annoying message:

**Usage Fault**
  Status Register: 0x01
    Undefined Instruction

However, I've found that when trying to step onto a 32-bit udf breakpoint, the MCU gets stuck in an infinite loop trying to single-step over the instruction again and again.

adamgreen commented 2 years ago

Does set osabi none fix this and cause bkpt instructions to be used for software breakpoints instead?

icecream95 commented 2 years ago

Yes, bkpt instructions are used as long as set osabi none is done early enough (such as with -iex).

adamgreen commented 2 years ago

However, I've found that when trying to step onto a 32-bit udf breakpoint, the MCU gets stuck in an infinite loop trying to single-step over the instruction again and again.

It would be interesting to see the communications between GDB and MRI when this happens. If you were able to use the set remotelogfile mri.log command to generate such a log and send me a snippet of the log around when this happens, then that would be great.

It should be possible to make the udf instruction look more like a breakpoint but it kind of seems like just documenting the set osabi none is the better thing to do because without it, it seems like GDB barely knows what kind of target it is talking to anyway. What other things might be broken?

icecream95 commented 2 years ago

set remotelogfile didn't produce any output for me, but this is what set debug remote shows:

[remote] Sending packet: $m10005c6,2#5a
[remote] Received Ack
[remote] Packet received: 83f0
[remote] Sending packet: $m10005c6,4#5c
[remote] Received Ack
[remote] Packet received: 83f00103
[remote] Sending packet: $X10005c6,4:��\000�#08 // encoding of UDF.W #0
[remote] Received Ack
[remote] Packet received: OK
[remote] Sending packet: $vCont;r10005c0,10005cc:-1;c#5e
[remote] Received Ack
[remote] wait: enter
[remote] wait: exit

This is the binary around the breakpoint set at 0x10005c6, where it should be pretty obvious that I'm compiling with -O0:

 10005bc:       ea43 0303       orr.w   r3, r3, r3
 10005c0:       4b23            ldr     r3, [pc, #140]  ; (1000650 <irq_handler_reset+0x20c>)
 10005c2:       781b            ldrb    r3, [r3, #0]
 10005c4:       b2db            uxtb    r3, r3
 10005c6:       f083 0301       eor.w   r3, r3, #1
 10005ca:       b2db            uxtb    r3, r3
 10005cc:       2b00            cmp     r3, #0
 10005ce:       d1f4            bne.n   10005ba <irq_handler_reset+0x176>

I can dump execution state from a SysTick handler if that would be useful for debugging the problem. I think that it happens even with UDF inserted permanently into the binary, so fixing the bug could be useful even when GDB isn't using a broken ABI.

Looking through arm-linux-tdep.c in GDB, a lot of it is about handling signals, syscalls, longjmp and the like. I haven't used any of those yet, so I can't say what will happen, but I'll at least note that backtracing past an exception handler seems to work the same with both OS ABIs. (They are both incorrect in the same way, so the problem is probably with the Call Frame Information directives I manually inserted into my exception trampolines.)

There are also the other OS ABIs, such as for the BSDs. I'm worried about the NetBSD code, which seems like it won't handle IT blocks correctly.

adamgreen commented 2 years ago

@icecream95

I think that it happens even with UDF inserted permanently into the binary, so fixing the bug could be useful even when GDB isn't using a broken ABI.

I agree with your observation here. I was able to reproduce your hang in MRI by attempting to single step over the following instruction:

    *(volatile uint32_t*)0xFFFFFFFF;

There is a bug in MRI where handling ranged single stepping is taking precedence over exception handling. I will start working on a fix for it soon.

adamgreen commented 2 years ago

I have opened issue #35 to track this issue.

Thanks for reporting. Much appreciated.

adamgreen commented 2 years ago

Issue #35 is now fixed. I also added a note to the README about your OS ABI findings. Thanks again.

adamgreen / mri

Fix hardware single step #32