GDB / wine helper scripts seem to cause malformed stack addresses like 0x0x

scorpion81 commented 10 months ago

Describe the bug Usage of the GDB wine scripts provided by https://john-millikin.com/debugging-win32-binaries-in-ghidra-via-wine, mentioned in https://github.com/NationalSecurityAgency/ghidra/issues/4534, causes errors with the formatted stack addresses like 0x0x12345678. This leads to internel ghidra python errors and possibly (?) also to unexpected SIGTRAP / ntdll.dll DbgBreakPoint calls which makes it hard to impossible to properly use breakpoints.

To Reproduce Steps to reproduce the behavior:

In Ghidra debugger open the gdb connector, change /usr/bin/gdb to /usr/bin/i686-w64-mingw32-gdb
Click on Connect and in the interpreter window enter file <path/to/>/hello-win32.exe, hit enter
Then enter source <path/to/>wine-win32.gdb, hit enter
In another terminal, enter wine /usr/share/win32/gdbserver.exe localhost:10000 <path/to/>hello-win32.exe
In the ghidra gdb interpreter window enter target extended-remote :10000, the program should start and run into a ntdll DbgBreakPoint
In the Debug "Objects" window, click on "Inferior" and then on interrupt (the yellow "Pause" symbol)
Then select the 1 - process 42000node in the Debug "Objects" window and click Resume
Notice how the program ends then and a python error message like this Evaluation of the expression containing the function (at 0x0xa5ed15) will be abandoned is being displayed. Furthermore you notice the 0x0x strings on the inferior thread nodes and in the Event Thread column in the Time tab.

Expected behavior No malformed addresses should occur, and perhaps no unexpected breakpoint (not quite sure about the latter), and the program should run without errors under the debugger.

Screenshots the first screenshot shows the malformed address in the inferior thread nodes and time tab, the second one shows also the python error message. Ghidra_Address_bug Ghidra_Address_bug_after

Attachments In this archive we have the hello-win32.exe as well as copies of the files provided by the author of the scripts. bug_reproduction_files.zip

Environment:

OS: Ubuntu 22.04 64 bits
Java Version: 17.0.9
Ghidra Version: 10.3.3, 11.0
Ghidra Origin: official GitHub distro

Additional context The hello-win32.exe can be compiled like described on https://john-millikin.com/debugging-win32-binaries-in-ghidra-via-wine

scorpion81 commented 10 months ago

Script files were added to Ghidra with this PR also: https://github.com/NationalSecurityAgency/ghidra/pull/4546

nsadeveloper789 commented 10 months ago

Can you repeat the experiment, and once you have 0000000 showing in the Stack panel, please try typing bt into the gdb Interpreter panel, please? I suspect it'll have the strange entry, too. Whatever the case, please include the results here.

Can you also try the same experiment, but without the helper scripts? (The Dynamic listing may be empty, but the other panels including the Stack should still be populated.) The helper scripts referenced are only for obtaining the memory map, and so they should not impact the stack. If 0 is what wine/gdbserver/gdb reports, there's not really anything we should do on the front-end. On the other hand, if we're parsing or displaying something incorrectly, we should fix it.

The error in the second screenshot indicates the target application has terminated. You can also see the message [Inferior 1 (Remote target) exited normally]. Certainly, the script could be more graceful in this circumstance, but it otherwise seems to be working fine.

Regarding the null thread names. Please type info threads in the gdb Interpreter panel and include the results here.

scorpion81 commented 10 months ago

First, sorry for the big follow up post with all those screenshots. I tried to include all info you requested in different steps of my experiment. First 3 states with script, then 2 without, and 1 with stepping (and without script), which shows also in this case something is wrong with the event thread name

State with script, before continuing: Ghidra_stack_bt_info_threads_before

State with script, after continuing once: Ghirda_stack_bt_info_threads_continue

State with script, after continung twice: Ghidra_stack_bt_info_threads_continue2

Notice how the event thread address / name is broken, and each continue causes a sigtrap to break again, also more snaps are added (but the latter seems to be intended functionality)

State without script, before and after: Ghidra_stack_bt_info_threads_noscript_before

Notice the "not supported on this target" messages, and no sigtraps... program ends normally but...

State without script, when single stepping after initial break: Ghidra_stack_bt_info_threads_noscript_step

Event thread naming / address is broken too. But no sigtraps. In all cases bt and info threads seem to output valid values. I think the broken event thread naming / address may cause sigtraps when being used with the script.

nsadeveloper789 commented 10 months ago

Okay, regarding the 00000000 entry in the stack. That is just what gdb is reporting. (See the output of bt with and without the script.) While it may not appear desirable in this case, Ghidra's duty is to display what gdb reports.

Regarding the thread naming, yeah, that's something we might look at. We're currently working on a different paradigm for connecting to gdb, so it's possible we won't care enough to fix it in this connector. It doesn't look like it's caused by the script, though, because it happens with and without it.

Regarding Signalled with SIGTRAP vs Stepping ended, I don't think you're being consistent in your choice of s vs c. You get Stepping ended when you use s. You are correct in the difference of behavior with c, though. It seems that will SIGTRAP only when the script is active; otherwise, the program runs to normal termination? Perhaps if the script is active, try stepping an instruction si and then continuing c?

Hagb commented 9 months ago

I cannot even let the gdbserver interrupt the debugged program correctly. As a workaround, I use kill -INT pid to interrupt it. And then I met the same problem, and found that it worked with stepi called before the script called the custom function after an interrupt. (Don't know why and don't know whether it is suitable for your cases.) The following is ~a possible script to do that~ (~not tested.~ causes deadlock of ghidra or gdb or something else I don't know):

set $interrupted = 0
define hookpost-interrupt
  $interrupted = 1
end
define hook-getpid-linux-i386
  if $interrupted == 1
    # let it be quiet so it will not break the format
    pipe stepi | cat >/dev/null
  end
  set $interrupted = 0
end

update: running it every time before rerunning the program, we could just save and reuse the pid:

define getpid-linux-i386
  # MOV eax,20 [SYS_getpid]
  # INT 0x80
  # RET
  set $linux_getpid = {int (void)}($esp-7)
  set {unsigned char[8]}($linux_getpid) = {\
    0xB8, 0x14, 0x00, 0x00, 0x00, \
    0xCD, 0x80, \
    0xC3 \
  }
  set $pid = $linux_getpid()
  define getpid-linux-i386
    output $pid
    echo \n
  end
  getpid-linux-i386
end

scorpion81 commented 9 months ago

Thanks to the script update involving querying the PID once and reusing the variable, i can now properly step thru the code. Otherwise it happened quite often that the program crashed or re-entered memory regions which were repeatedly overwritten by re-injecting the assembly code. The only minor thing is just the thread naming and 0x0x output, but it may be just a cosmetical thing.

Edit: I combined this new script with another approach of mine... just use a fixed address for the injected machine code and restore the original bytes after execution. I chose for simplicity just the address of the beginning of the exe header. But in some programs it may be that those areas of memory are being visited again, and if they still are modified, the program may crash or do unexpected things. It is not very likely to visit the header again, except the exe performs LoadLibrary() on itself within its code. And due to threading i think the debugger may get interrupted in an interrupt and the modified location is remaining in an inconsistent state. So the new idea of redefining the command after its first run and re-using the pid is just simple and brilliant :+1: This way it should be nearly impossible that the program may enter a modified code section again.

define getpid-linux-i386
  # MOV eax,20 [SYS_getpid]
  # INT 0x80
  # RET

  # this is a convenience var to store the old header
  set $address = 0x00400000
  set $header = *(IMAGE_DOS_HEADER*)$address    

  set $linux_getpid = {int (void)}($address)
  set {unsigned char[8]}($linux_getpid) = {\
    0xB8, 0x14, 0x00, 0x00, 0x00, \
    0xCD, 0x80, \
    0xC3 \
  }
  set $pid = $linux_getpid()

  # attempt to restore header
  set *(IMAGE_DOS_HEADER*)$address=$header

  #redefine now to get the stored pid
  define getpid-linux-i386
    output $pid
    echo \n
  end
  getpid-linux-i386
end

nsadeveloper789 commented 9 months ago

FWIW, we may have discovered an alternative solution in #6073.

Unless you're relying on GDB's ability to read the PE/PDB for symbols and sections, you might just try using gdb on the wine binary itself, then have it run the Windows target. You'll need to inspect wine's scripts a bit to get at the actual command line, but I think there's a lot less kruft in that solution.

NationalSecurityAgency / ghidra

GDB / wine helper scripts seem to cause malformed stack addresses like 0x0x #6075