DanielBenHayoun / automatic_functionality_detection

educational project for automation of functionality detection of binary files - in progress
1 stars 0 forks source link

Automatic Functionality Detection Of Stripped Binaries

this projects extends code2seq to work with striiped bibaries instead of source code. currently suppurts C/C++.

availdbale for Ubuntu 18.04.3 LTS.

prerequisites:

clone into this repo.

git clone https://github.com/DanielBenHayoun/automatic_functionality_detection.git --recursive

the following instruction assume the current location is ~/automatic_functionality_detection/

enviroment setup

install Anaconda:

run the following commands:
$ wget https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
$ bash /path_to_downloaded_script/Anaconda3-2020.02-Linux-x86_64.sh

where path_to_downloaded_script is the path for Anaconda3-2020.02-Linux-x86_64.sh on your machine. or follow instalation_guide

Create enviroment:
$ conda env update -f environment.yml
$ conda activate RE_project_1

To check what conda environments you have and which is active, run conda env list

Create dataset

in order to create dataset you will need to run the script decompile_all.sh ( change paths within the script )

$ ./scripts/decompile_all.sh

this script will decompile all binaries from SOURCE_PATH to OUTPUT_PATH (predifined in the script)

Train

Create Paths

after you've decompiled succesfuly the binaries, you need to extract paths from the source code. run the script create_paths.sh ( change paths within the script )

NOTE: follow the instructions regarding requirements here: cppminer
$ ./scripts/create_paths.sh

Pre-Preprocess

now you need to create to train,test and validation datasets. run this script :

$ ./scripts/pre_preprocess.sh <INPUT_FOLDER> <OUTPUT_FOLDER>

preprocess

create dictionaries and arrange dataset run the script inside cppminer/code2seq and add parameters acccordingly :

NOTE: follow instructions inside the script.

optional:

$ ./cppminer/code2seq/preprocess.sh <INPUT_FOLDER> <OUTPUT_FOLDER>

Train

run training using the script inside code2seq/ and change the parameters according to the instructions iside train.sh

NOTE: it is recommended to change code2seq/config.py file cefore training according to the designated mission. in this project we changed batch-size to be 11 and test-batch-size to 2

$ ./code2seq/train.sh