NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.65k stars 5.79k forks source link

WebAssembly Support #2937

Open nneonneo opened 3 years ago

nneonneo commented 3 years ago

Is your feature request related to a problem? Please describe. It would be very nice to support WebAssembly in Ghidra. There is a public plugin available (https://github.com/andr3colonel/ghidra_wasm) but it only loads wasm files without disassembly or decompilation support. Wasm code is becoming increasingly common on the Web for both good and bad purposes (e.g. malware, exploit code, cryptomining) and decompilation would be a huge boon.

Describe the solution you'd like A way to load .wasm binary files into Ghidra with disassembly/decompilation support.

Describe alternatives you've considered I have a pipeline for compiling wasm to x86-64 and then loading it in Ghidra or IDA, but the process is cumbersome and the output is decidedly suboptimal. Constructs like switches are often lost or poorly reconstructed. I have toyed with the idea of creating a module myself, but there are challenges with the Wasm bytecode format which make a straightforward SLEIGH disassembly difficult (see next section).

Additional context The main target of disassembly would be the instruction bytecode format. Here are the major challenges with the format as it relates to SLEIGH, as I see it:

I am not familiar enough with P-code or SLEIGH to know how to resolve these issues in a clean way. One (ugly) approach would be to transpile WASM bytecode to some intermediate format which would be easier to work with. But, the downside here is that it would break the association between the addresses in the file and the Ghidra disassembly, which would make debugging much more challenging. Ideas welcome - I am happy to pursue potentially implementing this myself if there's a clear way to deal with the challenges.

emteere commented 3 years ago

WASM has been on the queue to do for a while. We've seen the loader you mention and thought there might be a PR at some point with the rest of the implementation.

The decompiler is a simplification engine, so sometimes just providing equations that it can solve is enough. The constant propagation can also figure out some things given constant state. For example setting the value of a register at the start of a function that has an initial value that is kept consistent across the function which when fed into the equations produces a constant reference. However, the value of set registers is only used at the start of a function.

While I can't promise anything, we'll need to take a closer look and see if we can get a reasonable processor spec together, or provide a skeleton to solve some of the issues.

garrettgu10 commented 3 years ago

I'm interested in working on this. I think most of the problems raised can be solved pretty cleanly by bailing out to Java, in a similar manner to how the JVM module handles "Injects."

garrettgu10 commented 3 years ago

Still a few tricky instructions missing, but initial results are promising:

https://github.com/garrettgu10/ghidra-wasm-plugin

garrettgu10 commented 3 years ago

There's still some bugfixing and testing to do, but the basic feature set is now complete. I'm also interested in making a PR after bugfixing and refactoring.

The way branch target resolution and implicit pops are handled at decompilation time is a bit... strange, to say the least. I'd appreciate if someone experienced in Ghidra dev could review the plugin, and I'd be happy to walk anyone through it and/or brainstorm "nicer" ways to do things.

nneonneo commented 2 years ago

I was quite impressed with @garrettgu10's efforts, and so I started to make some changes. Eventually this turned into a bit of a rewrite: https://github.com/nneonneo/ghidra-wasm-plugin/

I believe this new plugin is substantially more usable, and I've used it practically for reversing large programs like Unity web games (>10MB Wasm files).

emteere commented 2 years ago

Looking forward to checking out both these implementations. I had started to take a look at the sleigh implementation to offer some refactoring, but had to put it on the back burner getting ready for 10.1 and other drains on my time. We've got a little breathing room now that 10.1 is almost released.

We are interested in a PR if you would like to submit one.

nneonneo commented 2 years ago

I’m glad to hear that there’s excitement for this, and I’d be happy to submit a PR.

One issue that would be nice to resolve before a formal PR would be the problem of supporting DWARF data import, and the main issue we’re having is that the existing Ghidra DWARF importer has no provisions to support Harvard architectures. Our Wasm plug-in uses a “modified” Harvard approach - we load the program code at a specific, arbitrary address (0x80000000) into the same address space as the program data, because that produces a nicer user experience.

What we would like there is support on the DWARF importer to supply a program-specific implementation of the DWARF “toAddr” function, with separate hooks for program (PC) addresses and data (static object) addresses. Separately, it might be nice to be able to hook some other DWARF functions to parse architecture-specific directives - right now, I think the only solution would be to provide a bunch of modified subclasses with our importer, but the architecture of the DWARF code is such that we would likely have to maintain a sizable amount of copied code for this to work.

On Nov 18, 2021, at 6:58 PM, emteere @.***> wrote:

 Looking forward to checking out both these implementations. I had started to take a look at the sleigh implementation to offer some refactoring, but had to put it on the back burner getting ready for 10.1 and other drains on my time. We've got a little breathing room now that 10.1 is almost released.

We are interested in a PR if you would like to submit one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

garrettgu10 commented 2 years ago

I like @nneonneo's implementation better, and I don't think the issue with the DWARF importer should block it from getting merged since the feature is still very useful without DWARF support.

I was just wondering if there's been progress on a PR and if there's anything I can contribute to make the process go more smoothly?

Thanks!

nneonneo commented 2 years ago

@garrettgu10 Thanks for the push :). I went through and cleaned up the code a bit and integrated it into a PR (#4103). Please feel free to take a look.

(CC @emteere who also expressed interest in the PR)

mumbel commented 2 years ago

I was trying to get pcodetest working with this and ran into an issue, have you tried the emulator for anything here yet by chance? Can't tell if I'm coming across a SLEIGH or an emulator issue.

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             void __wasm i1_complexLogic_Main(void)
                               assume contextreg = 0x4400000000000000000
             void              <VOID>         <RETURN>
                             i1_complexLogic_Main                            XREF[1]:     table:00000018(*)  
    ram:800013f8 8f 01           .locals
    ram:800013fa 01 7f           .local     count=0x1 type=0x7f
    ram:800013fc 01 7f           .local     count=0x1 type=0x7f
    ram:800013fe 01 7f           .local     count=0x1 type=0x7f

it was a hacky script so maybe I just didn't have enough state setup (setting PC/SP/LR). Starting at this main and then with emuHelper.step(monitor) I was getting these addresses:

Then hit a Divide by 0 in OpBehaviorIntSdiv's long evaluateBinary and

WARN  Uninitialized memory read at ram:800013fb: register:4ffffff0:8   (EmulatorHelper.java:710) 
WARN  Uninitialized memory read at ram:800013fb: register:4ffffff8:8   (EmulatorHelper.java:710) 
nneonneo commented 2 years ago

@mumbel probably better to put that on the PR, since it's specific to this implementation.