Open tetsuo-cpp opened 2 years ago
Consider this a co-signature and I would be happy to contribute as well. I would love to see libsla
usable in a Capstone-like fashion, but have had just enough trouble root-causing some of the parsing issues that come up that I've had to back-burner it for the time being.
I would love to see this happen!
Related, this popped up on my radar recently. Just dropping it here for context. https://github.com/jstaursky/coronium
just so u guys know, i recently got THIS in my recommendations , maybe something worth checking out
Thanks @UwUeeb! Since this hasn't been on our roadmap and someone provided an effective and well-maintained solution, I am going to go ahead and close this.
@ryanmkurtz That repo was created by my colleagues and I, and is what lead me to file this issue to begin with. It's been useful to us but, in my opinion, it doesn't address all of the issues that come with SLEIGH being maintained within Ghidra. If you agree, could you please re-open this?
Firstly, it's a maintenance burden keeping it up to date and is easy to get wrong. We periodically sync with the HEAD of Ghidra and run some testing in CI to ensure that things continue working. But even then, there are ways for things to silently fall out of sync. For example, we recently noticed that our list of headers that we install under the library's include/
were out of date (https://github.com/lifting-bits/sleigh/pull/107). We're now considering adding some script to check for this, but I expect that these sorts of issues will continue to crop up.
Additionally, our version of the SLEIGH library strips out some dependencies (BFD, ZLib, Iberty) to make it more practical to build on different platforms. At the moment, this works fine because these headers are included in files that contain optional implementations like LoadImageBfd
, however there's nothing to guarantee that it will remain this way since our library-ification of SLEIGH happens independently to Ghidra development. I'm concerned that if anything happens to make these dependencies harder to separate (perhaps SLEIGH begins including a BFD header elsewhere), this is going to be hard to deal with.
I will reopen it since there could be a future where we have a need to make sleigh into a standalone library, but as I mentioned it is not currently on our roadmap.
@ryanmkurtz Can you clarify whether you (and the team) are open to further discussing, reviewing, and accepting changes to the C++ decompiler/sleigh code to make third-party solutions (such as lifting-bits/sleigh, for which I am a maintainer) easier to maintain and package?
Some improvements I have in mind:
Makefile
, src/
, include/
, test/unittests/
, test/datatests/
)
include/ghidra/*.hh
(for example)
#include <ghidra/type.hh>
instead of #include <type.hh>
. The actual directory names could be up for discussion.using
declarations in header files.
using std::<symbol>
, still pollutes consumers that include the headers.RemUsing
) to automate almost the entire refactor of replacing the symbols from the standard library with their fully-qualified name, i.e. string
-> std::string
.
RemUsing
to automate this refactoring to generate the patches in https://github.com/lifting-bits/sleigh/pull/167 (happy to discuss this more) I understand some of these "improvements" are subjective. Still, I'm happy to discuss why and how each is generally the best practice for C++ development and that making the changes will encourage community engagement.
I also understand that reviewing PRs and discussing these things takes time. However, I hope that if the implementation is completed (and reviewed) by the community and the implementation/idea doesn't drastically affect the Ghidra team's internal development, then we can all benefit.
Also, I assume API changes and code refactoring to support more flexible library-like use-cases for the decompiler would be addressed on a case-by-case basis. Like the comment mentioning a capstone-like usage of the disassembler, I had trouble trying to cleanly implement this feature in the trailofbits/mishegos tool. I had to rewrite some of sleigh.cc
to remove the disassembler caching mechanism that couldn't be removed easily using C++ language features or sleigh APIs.
Are these sorts of PRs, to improve the flexibility of sleigh/decompiler as a library, open for inclusion in this project?
If there's a better way to organize or track individual requests, I can open separate issues or PRs for more isolated discussion. Please let me know if/how I should do this.
Thank you for your time!
@ekilmer Sorry for the delay, I wanted to bring this up to the team for discussion so I could respond appropriately (I am not actually in charge of anything). Generally speaking, we are very open to making Ghidra as extensible, shareable, accessible, etc. as possible. However, doing so requires work like everything else...sometimes a lot of work since the codebase is very old and established. So, it comes down to fitting it in with the priorities of all the other work we need to do.
We recognize that this ticket has value to the community...it's gaining momentum but not to the point where we are ready to start working on it ourselves. Having said that, we are open to accepting PR's that would improve this. Ideally they would be trickled in in manageable chunks that are easy to review, with priority given to the pieces that have the most impact. These PR's will have to then be prioritized against all of our other work, but we are open to it. It would help if the PR described the impact it would have so we can prioritize more appropriately.
Some other things:
SoftwareModeling
jar? (someone on the team suggested this). @ryanmkurtz Thank you for the response and clarity on this topic.
- Have you considered using the full blown Ghidra release in your workflow using say, headless and GhidraScripts? Is that just too slow performance-wise?
- Rather than using the C++ stuff, have you considered just using the SoftwareModeling jar? (someone on the team suggested this).
The tools we develop are already written in C++, so we thought using the C++ decompiler/sleigh, standalone, would be the best course of action. However, we have been integrating with Ghidra through native Java plugins and exploring RPC to our C++ tools. I had not considered using the SoftwareModeling
jar by itself... I will keep that in mind! Thank you.
Regarding https://github.com/NationalSecurityAgency/ghidra/pull/4715, we didn't know that it was solved insufficiently. This might be a good place to start with a PR.
I opened PR #5190 to completely fix/remove the using
declarations in headers.
It would also help our case (rizinorg/rz-ghidra). There are more problems with the current state of the decompiler - not everything is exposed via classes or methods, e.g. we had to do patches like these:
Hi there! Firstly, thanks so much for developing Ghidra.
To give a bit of background, I'm an engineer at Trail of Bits and am part of their research practice that develops binary lifting tools among other things. We've been interested in building binary analysis and lifting tools on top of SLEIGH, which led us to putting together this CMake project for SLEIGH.
At the moment, projects that use SLEIGH tend to independently vendor the sources under this directory and maintain their own build scripts, wrappers, bindings, etc. Another example of this is the angr project, which uses its own pypcode library, that maintains a set of build scripts and Python bindings.
I think it'd be great for the ecosystem if we could separate SLEIGH into its own official library so it can be used elsewhere, with Ghidra being the main user.
I don't have a firm idea of how this should work, but I wanted to get a feel for whether you're interested in this idea. If so, I'd be happy to partner with the Ghidra team to help make this happen.