NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
52.06k stars 5.9k forks source link

Avoid recompilation of `-postScript` when executing `analyzeHeadless` #6884

Closed GscheadaHamme closed 2 months ago

GscheadaHamme commented 2 months ago

Hello there,

I'm planning to analyze large amounts of binary libraries with analyzeHeadless and execute a -postScript for each binary that exports the CFG for every function found by Ghidra to JSON.

Currently one of the most resource intensive parts of this process is the step between the outputs INFO REPORT: Analysis succeeded for file: ... and INFO SCRIPT: /path/to/script.java (HeadlessAnalyzer). I assume, that indicates the recompilation of -postScript in each run. Is there a way to avoid the recompilation?

Many thanks for your support.

ryanmkurtz commented 2 months ago

When you say "I assume", might I ask how deeply you dug into profiling it? Were you playing with custom builds of Ghidra, or looking at the source and noting that nothing much happens between the prints? A couple of other questions:

  1. Is this a Java-based GhidraScript?
  2. Are you running multiple instances of analyzeHeadless in parallel?
  3. As a sanity check, can you make your post script just do nothing and return and confirm the slowness is in our code?

Thanks!

GscheadaHamme commented 2 months ago

Thank you for the fast reply, I will provide more information.

Edit: NVM, ~/.ghidra is not empty, its contents are just hidden files.

GscheadaHamme commented 2 months ago
  1. Yes, it is a Java-based GhidraScript.
  2. No, currently I'm not running multiple instances of analyzeHeadless in parallel, but I'm planning to do so. My preferred approach is to create a fresh project for each binary I want to analyze to avoid race conditions.
  3. My current test setup:
    • Distribution: Arch Linux
    • Kernel: 6.10.7-arch1-1
    • Ghidra-Version: 11.0.3-1
    • command: ghidra-analyzeHeadless TestProject Test -log Test.log -scriptPath scripts -postScript TestHelloWorld.java -process If

File Contents:

public class TestHelloWorld extends GhidraScript { @Override public void run() throws Exception { println("Hello World"); } }

- `If.c` from which the binary `If` was compiled:
```If.c
int main() {
  int x = 0;
  int y = 1;
  if (x < y) {
    return 1;
  } else {
    return 0;
  }
}

Do you see any potential to reduce the time needed for the steps between (HeadlessAnalyzer) REPORT: Analysis succeeded for file: /If and (HeadlessAnalyzer) SCRIPT: <path>/scripts/TestHelloWorld.java?

GscheadaHamme commented 2 months ago

Caveat: this test was performed using a script directory containing large bin and build folders resulting from script development with neovim. After removing those folders, performance increased massively. I will test that approach for my export script, too.

GscheadaHamme commented 2 months ago

Thank you for your patience. After moving the scripts to an individual folder, performance increased drastically (7 seconds reduced to 1 second). More sophisticated profiling exceeds the scope of my work. Thank you once again for the quick response.