boricj / ghidra-delinker-extension

Ghidra extension for exporting relocatable object files
Apache License 2.0
361 stars 14 forks source link

Upstream delinker data model #1

Open boricj opened 1 year ago

boricj commented 1 year ago

Currently, the delinker data model used by this extension is private and independent of Ghidra's program database. If it was upstreamed, it would then be properly integrated with the rest of Ghidra and allow an ecosystem of plugins/extensions/scripts around it.

This is expected to be an extremely tricky issue that will take a very long time to solve, since:

Note: this issue is about enabling the relocation synthesizer and object file exporters to work directly on top of a vanilla Ghidra program database model, without having to store any data in private places. It is not about upstreaming the relocation synthesizers or the object file exporters themselves.

Requirements:

widberg commented 3 months ago

I love this project, I'm glad you're continuing to work on unlinking after ghidra-unlinker-scripts (although the earlier project is easier to find since "unlinking" seems to be the more popular term). I want to add a potential use case I have seen for unlinking: speeding up the development of fuzzing harnesses. Being able to rip the unit under test out of the target executable and write a small wrapper around it is much faster than writing patch code or hooking. This could fall under the "creating libraries" use case, but is more specific. I'm sorry I don't have any links to point to where someone has done this and written about it. I will also add that GNU ld's --wrap option is pure magic when it comes to wrapping unlinked functions.

You are right that unlinking is niche, but we are not alone! The existence of wcc, unlinker, and unlinkerida demonstrates that there is a need for this kind of tool (If there are more please let me know, those are the only ones I know about).

To agree with your second point above, I have settled on a small hand-rolled unlinker tailored to my target in a decompilation project since none of the previously mentioned solutions work perfectly for me. But, I have tried this new tool of yours and got impressive results. I look forward to seeing where this goes and best of luck!

boricj commented 3 months ago

Hi,

I love this project, I'm glad you're continuing to work on unlinking after ghidra-unlinker-scripts (although the earlier project is easier to find since "unlinking" seems to be the more popular term).

Yeah, I've adopted delinking over unlinking terminology because we already have disassemblers and decompilers. I'm not quite sure what form would be more grammatically correct in English and there's barely any literature online about it anyway.

I want to add a potential use case I have seen for unlinking: speeding up the development of fuzzing harnesses. Being able to rip the unit under test out of the target executable and write a small wrapper around it is much faster than writing patch code or hooking. This could fall under the "creating libraries" use case, but is more specific. I'm sorry I don't have any links to point to where someone has done this and written about it. I will also add that GNU ld's --wrap option is pure magic when it comes to wrapping unlinked functions.

Neat, I didn't think of this use case before.

I built this as part of a video game decompilation/reverse-engineering project and I don't have a background in cyber-security, so my own perspective on what's possible with this is somewhat biased and limited. There are probably tons of other valid use cases, depending on which hat one's wearing.

You are right that unlinking is niche, but we are not alone! The existence of wcc, unlinker, and unlinkerida demonstrates that there is a need for this kind of tool (If there are more please let me know, those are the only ones I know about).

I was vaguely aware of wcc, but not of the rest. I built my tooling from first principles without looking at prior art, after failing to find anything that fit my needs.

I'm not part of the reverse-engineering or cyber-security communities, so I don't know where or how to publicize my stuff. I've also found that people tend to have a really hard time wrapping their heads around this concept, which doesn't help spreading the word about it. The extension is steadily accumulating stars but there's almost no feedback: since it can't possibly be that user-friendly/featureful/bug-free, I don't know of any actual users besides myself, except for the one who contributed COFF support (and even then, I haven't heard from them since).

I do wonder how you've stumbled upon this. Besides a couple of Hacker News comments, some chaff over at Ghidra's GitHub repository and my personal blog which has at most 10 readers maybe, I haven't put any efforts into advertising it.

To agree with your second point above, I have settled on a small hand-rolled unlinker tailored to my target in a decompilation project since none of the previously mentioned solutions work perfectly for me. But, I have tried this new tool of yours and got impressive results. I look forward to seeing where this goes and best of luck!

Do you have any specific feedback on my extension as applied to your use case? I want to build a general-purpose tool that others can use, so I'm interested in hearing any issues that would prevent you from using it.

As for where this goes, right now it's adequately serving my needs, which is delinking PlayStation code as part of a decompilation project. Because of this, I don't expect that I'll be adding new features myself anytime soon, but I'll actively try and integrate anything that comes my way. So far, I've received one PR for a COFF exporter (#5), which I merged a couple of weeks ago.

So I guess the next big step is to turn my eldritch ramblings into marketable human language and get some exposure, somehow, because I've carried it as far as I can by myself. There's only so many object file format specifications, ISA manuals, toolchain manpages and platform-specific details that can fit into my head while also keeping a decompilation project ongoing.

The areas of improvements I can foresee in no specific order are:

widberg commented 3 months ago

Delinker definitely pairs better with decompiler, can't argue with that. Funny you mention HN because that's exactly how I stumbled across this project (this post in specific). I recognized your user name from the unlinker scripts which I played with early last summer, although I didn't have anything to do with MIPS so I didn't do much beside mess around. I ended up landing on wcc back then and it did what I needed: x86_64 ELF files. This time around I needed a way to unlink a C++ x86_32 Windows PE image files into COFF object files and saw that COFF support was recently added here.

As far as issues go, I took a brief look at the synthesized relocation list, compared it to the output of of my hand-rolled solution for my target binary, and saw some things that should be relocated were not in the list. I can try to compile a smaller example than a whole game and open an issue. I don't have it up right now but IIRC it missed an absolute address reference to the values table of a switch statement immediately following the function the reference was in, Ghidra itself did have this XRef marked. Which seemed odd since it found the absolute address reference to the jump table for the same switch which was right before the values table it missed. I'll play with it some more later and report back.