NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.23k stars 5.84k forks source link

Batch Import raw binary images #1235

Open aosti opened 4 years ago

aosti commented 4 years ago

Describe the bug I have a folder full of raw binary files (.bin) that contain MIPS instructions and want to perform batch import on Ghidra. However, after selecting them, I'm shown the screen as contained in the screenshot section.

To Reproduce Steps to reproduce the behavior:

  1. Execute ./ghidraRun
  2. Click on File > Batch Import
  3. Select the folder containing the binary files
  4. See unexpected behauvior.

Expected behauvior The list "Files to Import" get filled with the 276 files in the folder.

Screenshots image

Environment (please complete the following information):

Sample binary files: sample.zip

dragonmacher commented 4 years ago

Has this feature worked for you on this directory in a previous version?

Have you tried changing the 'Depth limit' value?

aosti commented 4 years ago

Changing the depth limit for bigger values has no effect. Note that single importing as raw binary works. I'll do a test with older versions.

dragonmacher commented 4 years ago

I would expect this to work. We will have to see if we can re-create this locally.

aosti commented 4 years ago

I'll attach some files to this ticket.

dev747368 commented 4 years ago

This is working as designed.

The batch importer currently filters out anything that doesn't have a defined loader, which includes raw binary files. It does this to limit the spam that random binary / text / etc files would cause.

I could see how this would useful option though, if you were importing a container that you were sure about. I'll transform this issue into an enhancement request.

soutzis commented 4 years ago

Since this is still, open I'd like to add my 2 cents. I have a folder full of raw (malware) binaries, along with an associated report in .json format. The filenames are the sha-256 digest of the binary and the reports just have the additional .json extension.

Example: raw file -> 0ddee76c519101aa4ded7546408f51465bf2adaf9b584ad5fb87845583e47276 associated report -> 0ddee76c519101aa4ded7546408f51465bf2adaf9b584ad5fb87845583e47276.json

Currently, when I try to do a batch import with a lot of samples (directories with 10,000 - 200,000 samples), ghidra throws an IndexOutOfBoundsException:

image

A very useful enhancement, would be to have the option to choose what filetypes to exclude, while importing everything else (considering that the user is sure of what they are batch-importing). In my case, I could exclude any '.json' file only.

I'm not sure if what I'm experiencing is a bug. I will investigate further and update this comment.

dev747368 commented 4 years ago

@soutzis yes, that sounds like a previously reported bug (#1572). One of the file-format loaders, when probing one of the binaries, is having a problem (that it should be catching on its own).

In this case, your enhancement request isn't 100% clear. Do you suspect that the error is caused by the .json file? My guess is that it probably is not causing the error. Its probably a malformed PE or ELF file.

soutzis commented 4 years ago

@dev747368 Most likely the error is not caused by the .json file. Nevertheless, being able to exclude file types during the batch import process could be still beneficial. For example, it would be clearer that the malformed binary is causing the error.

dev747368 commented 4 years ago

Well, you can exclude files types.... after the initial scan and you've got your list, before you hit the start button, you can check / uncheck individual rows in the table to include / exclude that row (which is grouped by file type & loader & lang).