this projects extends code2seq to work with striiped bibaries instead of source code. currently suppurts C/C++.
availdbale for Ubuntu 18.04.3 LTS.
clone into this repo.
git clone https://github.com/DanielBenHayoun/automatic_functionality_detection.git --recursive
the following instruction assume the current location is ~/automatic_functionality_detection/
install Anaconda:
$ wget https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
$ bash /path_to_downloaded_script/Anaconda3-2020.02-Linux-x86_64.sh
where path_to_downloaded_script
is the path for Anaconda3-2020.02-Linux-x86_64.sh on your machine. or follow instalation_guide
$ conda env update -f environment.yml
$ conda activate RE_project_1
To check what conda environments you have and which is active, run
conda env list
in order to create dataset you will need to run the script decompile_all.sh ( change paths within the script )
$ ./scripts/decompile_all.sh
this script will decompile all binaries from SOURCE_PATH to OUTPUT_PATH (predifined in the script)
after you've decompiled succesfuly the binaries, you need to extract paths from the source code. run the script create_paths.sh ( change paths within the script )
$ ./scripts/create_paths.sh
now you need to create to train,test and validation datasets. run this script :
$ ./scripts/pre_preprocess.sh <INPUT_FOLDER> <OUTPUT_FOLDER>
create dictionaries and arrange dataset run the script inside cppminer/code2seq and add parameters acccordingly :
optional:
$ ./cppminer/code2seq/preprocess.sh <INPUT_FOLDER> <OUTPUT_FOLDER>
run training using the script inside code2seq/ and change the parameters according to the instructions iside train.sh
NOTE: it is recommended to change code2seq/config.py file cefore training according to the designated mission. in this project we changed batch-size to be 11 and test-batch-size to 2
$ ./code2seq/train.sh