ares-emulator / ares

ares is a cross-platform, open source, multi-system emulator, focusing on accuracy and preservation.
https://ares-emu.net
Other
939 stars 114 forks source link

[gdb][N64] Pausing with a python-implemented breakpoint often breaks #1359

Open Dragorn421 opened 8 months ago

Dragorn421 commented 8 months ago

Bug description

Pausing execution (^C in gdb) while the program is running and there is a python-implemented breakpoint in gdb doesn't work properly and often hangs

Reproducing

You will need the N64 rom and its elf. They can be built with libdragon and https://github.com/Dragorn421/n64homebrew/tree/15866a5993f3fac7bc020d2eb4c1a20cc6977120/test_gdb or download them: test_gdb.zip

  1. get test_gdb.z64 and build/test_gdb.elf (see above)
  2. get gdb_corrupt.py and gdbrun.sh from https://github.com/Dragorn421/n64homebrew/tree/15866a5993f3fac7bc020d2eb4c1a20cc6977120/test_gdb
  3. run ares with test_gdb.z64 and gdbrun.sh (uses build/test_gdb.elf and runs gdb_corrupt.py)
  4. repeatedly pause and resume execution by alternating ^C and c
  5. eventually an Invalid hex digit XX (XX = placeholder) error shows up
  6. typically you can't resume execution from there or after pausing another time, even though gdb says "Continuing" the ares window still reports being paused (stop icon near the gdb status). The option from there is usually for me restarting gdb and ares entirely.

See screenshots below

Expected behavior

There should be no error, or they shouldn't mess up later usage of gdb

Screenshots

"variant" with the breakpoint in gdb_corrupt.py doing nothing:

image

"variant" with the breakpoint in gdb_corrupt.py doing print(gdb.parse_and_eval("count"), end="\r")

image

Notice in both screenshots gdb has been instructed to continue execution, but execution is still paused (as indicated by the stop icon in ares' gdb status and nothing being printed to console)

Additional context

OS: Kubuntu Linux ares version: built from source from v134

Discussion

I am not sure this is a gdb server implementation issue on ares' part, it may well be a gdb client bug, but I think that's less likely given gdb has much more testing/usage than ares does. I'm opening an issue mostly to start documenting this because I expect it's a rabbit hole

I'm thinking "Invalid hex digit XX" could mean there's some miscommunication in the gdb protocol between gdb and ares, that is, a mistake in the ares gdb server implementation, but idk

Dragorn421 commented 8 months ago

This triggered a discussion on the ares discord starting here: https://discord.com/channels/976404869386747954/976463759935696977/1192472233235464212

Here are some notes:

recompile ares after changing GDB_LOG_MESSAGES to true

one of the print calls has a bug: print is used instead of printf

    if constexpr(GDB_LOG_MESSAGES) {
      print("GDB <: %s\n", cmdBuffer.data());
    }

from gdb source, error (_("Invalid hex digit %d"), a); so e.g. in Invalid hex digit 79 79 is the codepoint for character O (letter O) (this made @HailToDodongo correctly speculate an OK was wrongly sent back as a response)

running gdb with "debug remote" on:

gdb-multiarch build/test_gdb.elf \
-readnow \
-ex 'set debug remote' \
-ex 'set debug remote-packet-max-chars unlimited' \
-ex 'set logging enabled' \
-ex 'set logging file gdb_out.txt' \
...

produces a dump of the sent/received data

this indeed showed an "OK" response to a "memory read" command, which doesn't expect OK (gdb client log:)

[remote] Sending packet: $m800164e0,4#95
[remote] Received Ack
[remote] Packet received: OK
Invalid hex digit 79

more complete logs near the problem area:

ares gdb server log ``` GDB <: Z0,800164e0,4 GDB >: +$OK#9a GDB <: Hc0 GDB >: +$OK#9a GDB <: c <<1049>> GDB >: +$S05#b8 GDB <: g GDB >: +$0000000000000000ffffffff80020000000000000000000000000000045da444000000000000b71b0000000000000009000000000a00b56000000000000000000000000000000009ffffffffb3ff0020ffffffffb3ff0000000000000000000000000000000000040000000000000004fffffffffffffffc0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000003f00000000000000010000000000000000ffffffff8000f6e000000000000000000000000000000000ffffffff8002dcb0ffffffff807fffc80000000000000000ffffffff8001651800000000241014e1000000000000b71b000000000000000000000000000000000000000000000000ffffffff800164e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000f00#26 GDB <: z0,800164e0,4 GDB >: +$OK#9a GDB <: m800164e0,4 GDB >: +$8F828020#b2 GDB <: CTRL+C [0x03] GDB <: m800164e0,4 GDB >: +$8F828020#b2 GDB >: +$S05#b8 GDB <: m800164e4,4 GDB >: +$8F86815C#cd GDB <: qfThreadInfo GDB >: +$m1#9e GDB <: qfThreadInfo GDB >: +$m1#9e GDB <: qsThreadInfo GDB >: +$l#6c GDB <: qsThreadInfo GDB >: +$l#6c GDB <: D GDB >: +$OK#9a GDB ending session, disconnecting client <<1050>> <<1051>> ``` gdb client log ``` [remote] Sending packet: $Z0,800164e0,4#de [remote] Received Ack [remote] Packet received: OK [remote] Sending packet: $Hc0#db [remote] Received Ack [remote] Packet received: OK [remote] Sending packet: $c#63 [remote] Received Ack [remote] wait: enter [remote] Packet received: S05 [remote] select_thread_for_ambiguous_stop_reply: enter [remote] select_thread_for_ambiguous_stop_reply: process_wide_stop = 0 [remote] select_thread_for_ambiguous_stop_reply: first resumed thread is Thread 1 [remote] select_thread_for_ambiguous_stop_reply: is this guess ambiguous? = 0 [remote] select_thread_for_ambiguous_stop_reply: exit [remote] wait: exit [remote] Sending packet: $g#67 [remote] Received Ack [remote] Packet received: 0000000000000000ffffffff80020000000000000000000000000000045da444000000000000b71b0000000000000009000000000a00b56000000000000000000000000000000009ffffffffb3ff0020ffffffffb3ff0000000000000000000000000000000000040000000000000004fffffffffffffffc0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000003f00000000000000010000000000000000ffffffff8000f6e000000000000000000000000000000000ffffffff8002dcb0ffffffff807fffc80000000000000000ffffffff8001651800000000241014e1000000000000b71b000000000000000000000000000000000000000000000000ffffffff800164e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000f00 [remote] Sending packet: $z0,800164e0,4#fe [remote] Received Ack [remote] Packet received: OK [remote] Sending packet: $m800164e0,4#95 [remote] Received Ack [remote] Packet received: 8F828020 [remote] pass_ctrlc: enter [remote] interrupt: enter [remote] interrupt: exit [remote] pass_ctrlc: exit [remote] Sending packet: $m800164e0,4#95 [remote] Received Ack [remote] Packet received: 8F828020 [remote] Sending packet: $m800164e4,4#99 [remote] Received Ack [remote] Packet received: S05 Invalid hex digit 83 [remote] wait: enter [remote] wait: exit ```

in particular:

GDB <: CTRL+C [0x03]
GDB <: m800164e0,4
GDB >: +$8F828020#b2
GDB >: +$S05#b8

it appears the ares gdb server can answer out of order

this is specifically only an issue for Ctrl+C, because of the implementation for reacting to this command:

  auto Server::haltProgram() -> void {
    forceHalt = true;
    haltSignalSent = false;
  }

which doesn't prevent further commands from being parsed and answered, before there is an answer about stopping the program.

So the fix to work on implementing is basically "don't answer commands out of order" or "wait for ctrl+c to be answered to further process commands"