NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.18k stars 5.83k forks source link

Proposal: Make SLEIGH a standalone library #3652

Open tetsuo-cpp opened 2 years ago

tetsuo-cpp commented 2 years ago

Hi there! Firstly, thanks so much for developing Ghidra.

To give a bit of background, I'm an engineer at Trail of Bits and am part of their research practice that develops binary lifting tools among other things. We've been interested in building binary analysis and lifting tools on top of SLEIGH, which led us to putting together this CMake project for SLEIGH.

At the moment, projects that use SLEIGH tend to independently vendor the sources under this directory and maintain their own build scripts, wrappers, bindings, etc. Another example of this is the angr project, which uses its own pypcode library, that maintains a set of build scripts and Python bindings.

I think it'd be great for the ecosystem if we could separate SLEIGH into its own official library so it can be used elsewhere, with Ghidra being the main user.

I don't have a firm idea of how this should work, but I wanted to get a feel for whether you're interested in this idea. If so, I'd be happy to partner with the Ghidra team to help make this happen.

bdemick commented 2 years ago

Consider this a co-signature and I would be happy to contribute as well. I would love to see libsla usable in a Capstone-like fashion, but have had just enough trouble root-causing some of the parsing issues that come up that I've had to back-burner it for the time being.

beigela commented 2 years ago

I would love to see this happen!

Related, this popped up on my radar recently. Just dropping it here for context. https://github.com/jstaursky/coronium

w1redch4d commented 2 years ago

just so u guys know, i recently got THIS in my recommendations , maybe something worth checking out

ryanmkurtz commented 2 years ago

Thanks @UwUeeb! Since this hasn't been on our roadmap and someone provided an effective and well-maintained solution, I am going to go ahead and close this.

tetsuo-cpp commented 2 years ago

@ryanmkurtz That repo was created by my colleagues and I, and is what lead me to file this issue to begin with. It's been useful to us but, in my opinion, it doesn't address all of the issues that come with SLEIGH being maintained within Ghidra. If you agree, could you please re-open this?

Firstly, it's a maintenance burden keeping it up to date and is easy to get wrong. We periodically sync with the HEAD of Ghidra and run some testing in CI to ensure that things continue working. But even then, there are ways for things to silently fall out of sync. For example, we recently noticed that our list of headers that we install under the library's include/ were out of date (https://github.com/lifting-bits/sleigh/pull/107). We're now considering adding some script to check for this, but I expect that these sorts of issues will continue to crop up.

Additionally, our version of the SLEIGH library strips out some dependencies (BFD, ZLib, Iberty) to make it more practical to build on different platforms. At the moment, this works fine because these headers are included in files that contain optional implementations like LoadImageBfd, however there's nothing to guarantee that it will remain this way since our library-ification of SLEIGH happens independently to Ghidra development. I'm concerned that if anything happens to make these dependencies harder to separate (perhaps SLEIGH begins including a BFD header elsewhere), this is going to be hard to deal with.

ryanmkurtz commented 2 years ago

I will reopen it since there could be a future where we have a need to make sleigh into a standalone library, but as I mentioned it is not currently on our roadmap.

ekilmer commented 1 year ago

@ryanmkurtz Can you clarify whether you (and the team) are open to further discussing, reviewing, and accepting changes to the C++ decompiler/sleigh code to make third-party solutions (such as lifting-bits/sleigh, for which I am a maintainer) easier to maintain and package?

Some improvements I have in mind:

  1. Restructuring the project directory into a more traditional organization (example: Makefile, src/, include/, test/unittests/, test/datatests/)
    • While this isn't strictly necessary, it's a bit weird for the Makefile to reference the parent directory.
  2. Separating header files from source files (as mentioned in an earlier comment) into include/ghidra/*.hh (for example)
    • This is useful when the decompiler headers are included in another project to prevent conflicts with header names, i.e. #include <ghidra/type.hh> instead of #include <type.hh>. The actual directory names could be up for discussion.
  3. Add missing header guards to prevent double-inclusion
  4. More clean-up of using declarations in header files.

I understand some of these "improvements" are subjective. Still, I'm happy to discuss why and how each is generally the best practice for C++ development and that making the changes will encourage community engagement.

I also understand that reviewing PRs and discussing these things takes time. However, I hope that if the implementation is completed (and reviewed) by the community and the implementation/idea doesn't drastically affect the Ghidra team's internal development, then we can all benefit.


Also, I assume API changes and code refactoring to support more flexible library-like use-cases for the decompiler would be addressed on a case-by-case basis. Like the comment mentioning a capstone-like usage of the disassembler, I had trouble trying to cleanly implement this feature in the trailofbits/mishegos tool. I had to rewrite some of sleigh.cc to remove the disassembler caching mechanism that couldn't be removed easily using C++ language features or sleigh APIs.

Are these sorts of PRs, to improve the flexibility of sleigh/decompiler as a library, open for inclusion in this project?


If there's a better way to organize or track individual requests, I can open separate issues or PRs for more isolated discussion. Please let me know if/how I should do this.

Thank you for your time!

ryanmkurtz commented 1 year ago

@ekilmer Sorry for the delay, I wanted to bring this up to the team for discussion so I could respond appropriately (I am not actually in charge of anything). Generally speaking, we are very open to making Ghidra as extensible, shareable, accessible, etc. as possible. However, doing so requires work like everything else...sometimes a lot of work since the codebase is very old and established. So, it comes down to fitting it in with the priorities of all the other work we need to do.

We recognize that this ticket has value to the community...it's gaining momentum but not to the point where we are ready to start working on it ourselves. Having said that, we are open to accepting PR's that would improve this. Ideally they would be trickled in in manageable chunks that are easy to review, with priority given to the pieces that have the most impact. These PR's will have to then be prioritized against all of our other work, but we are open to it. It would help if the PR described the impact it would have so we can prioritize more appropriately.

Some other things:

ekilmer commented 1 year ago

@ryanmkurtz Thank you for the response and clarity on this topic.

  • Have you considered using the full blown Ghidra release in your workflow using say, headless and GhidraScripts? Is that just too slow performance-wise?
  • Rather than using the C++ stuff, have you considered just using the SoftwareModeling jar? (someone on the team suggested this).

The tools we develop are already written in C++, so we thought using the C++ decompiler/sleigh, standalone, would be the best course of action. However, we have been integrating with Ghidra through native Java plugins and exploring RPC to our C++ tools. I had not considered using the SoftwareModeling jar by itself... I will keep that in mind! Thank you.

Regarding https://github.com/NationalSecurityAgency/ghidra/pull/4715, we didn't know that it was solved insufficiently. This might be a good place to start with a PR.

I opened PR #5190 to completely fix/remove the using declarations in headers.

XVilka commented 10 months ago

It would also help our case (rizinorg/rz-ghidra). There are more problems with the current state of the decompiler - not everything is exposed via classes or methods, e.g. we had to do patches like these: