TikNib is a binary code similarity analysis (BCSA) tool. TikNib enables evaluating the effectiveness of features used in BCSA. One can extend it to evaluate other interesting features as well as similarity metrics.
Currently, TikNib supports features as listed below. TikNib also employs an interpretable feature engineering model, which essentially measures the relative difference between each feature. In other words, it captures how much each feature differs across different compile options. Note that this model and its internal similarity scoring metric is not the best approach for addressing BCSA problems, but it can help analyze how the way of compilation affects each feature.
TikNib currently focuses on function-level similarity analysis, which is a fundamental unit of binary analysis.
For more details, please check our paper.
For building the cross-compiling environment and dataset, please check BinKit.
TikNib has two parts: ground truth building and feature extraction.
To see the scripts used in our evaluation, please check the shell scripts under /helper. For example, run_gnu.sh builds ground truth and extracts features for GNU packages. Then, run_gnu_roc.sh computes the ROC AUC for the results. You have to run these scripts sequentially as the second one utilizes the cached results obtained from the first one. We also added top-k results for the OpenSSL package, which is described in Sec 5.3 in our paper. Please check run_openssl_roc.sh and run_openssl_roc_topk.sh in the same directory, of which should also be executed sequentially.
TikNib includes scripts for building ground truth for evaluation, as described in Sec 3.2 in our paper. After compiling the datasets using BinKit, we build ground truth as below.
Given two functions of the same name, we check if they originated from the same source files and if their line numbers are the same. We also check if both functions are from the same package and from the binaries of the same name to confirm their equivalence. Based on these criteria we conducted several steps to build ground truth and clean the datasets. For more details, please check our paper.
Configure path variables for your environment at config/path_variables.py
.
This step takes the most time.
This step fetches preliminary data for the functions in each binary and stores
the data in a pickle
format. For a given binary, it generates a pickle file on
the same path with a suffix of .pickle
. Please configure the chunk_size
for
parallel processing.
For IDA Pro v6.95 (original version in the paper), use
tiknib/ida/fetch_funcdata.py
.
$ python helper/do_idascript.py \
--idapath "/home/dongkwan/.tools/ida-6.95" \
--idc "tiknib/ida/fetch_funcdata.py" \
--input_list "example/input_list_find.txt" \
--log
For IDA Pro v7.5, use tiknib/ida/fetch_funcdata_v7.5.py
.
$ python helper/do_idascript.py \
--idapath "/home/dongkwan/.tools/ida-7.5" \
--idc "tiknib/ida/fetch_funcdata_v7.5.py" \
--input_list "example/input_list_find.txt" \
--log
Additionally, you can use this script to run any idascript for numerous binaries in parallel.
This extracts source file name and line number by parsing the debugging
information in a given binary. The binary must have been compiled with
the -g
option.
$ python helper/extract_lineno.py \
--input_list "example/input_list_find.txt" \
--threshold 1
This filters functions by checking the source file name and line number. This removes compiler intrinsic functions and duplicate functions spread over multiple binaries within the same package.
$ python helper/filter_functions.py \
--input_list "example/input_list_find.txt" \
--threshold 1
This counts the number of functions and generates a graph of that function
on the same path of input_list
. This also prints the numbers separated
by ','
. In the below example, a pdf file containing the graph will be
created in example/input_list_find.pdf
$ python helper/count_functions.py \
--input_list "example/input_list_find.txt" \
--threshold 1
This is the exact same step as the one described above.
By utilizing ctags
, this will extract type information. This will add
abstract_args_type
and abstract_ret_type
into the previously created
.pickle
file.
$ python helper/extract_functype.py \
--source_list "example/source_list.txt" \
--input_list "example/input_list_find.txt" \
--ctags_dir "data/ctags" \
--threshold 1
For example, for a function type of mode_change *__usercall@<rax>(const char *ref_file@<rsi>)
extracted from IDA Pro, it will follow the ctags and
recognizes mode_change
represents for a custom struct
. Consequently, it adds
new data as below.
'abstract_args_type': ['char *'],
'abstract_ret_type': 'struct *',
This extracts numeric presemantic features as stated above.
$ python helper/extract_features.py \
--input_list "example/input_list_find.txt" \
--threshold 1
The extracted features will be stored in each .pickle
file. Below is an
example showing a part of extracted features for the mode_create_from_ref
function in the find
binary in findutils
.
{
'package': 'findutils-4.6.0',
'bin_name': 'find.elf',
'name': 'mode_create_from_ref',
'arch': 'x86_64',
'opti': 'O3',
'compiler': 'gcc-8.2.0',
'others': 'normal',
'func_type': 'mode_change *__usercall@<rax>(const char *ref_file@<rsi>)',
'abstract_args_type': ['char *'],
'ret_type': 'mode_change *',
'abstract_ret_type': 'struct *',
'cfg': [(0, 1), (0, 2), (1, 2)],
'cfg_size': 3,
'feature': {
'cfg_avg_degree': 2,
'cfg_avg_indegree': 1,
'cfg_avg_loopintersize': 0,
'cfg_avg_loopsize': 0,
'cfg_avg_outdegree': 1,
'cfg_avg_sccsize': 1,
'cfg_max_depth': 2,
'cfg_max_width': 2,
'cfg_num_backedges': 0,
'cfg_num_bfs_edges': 2,
'cfg_num_degree': 6,
'cfg_num_indegree': 3,
'cfg_num_loops': 0,
'cfg_num_loops_inter': 0,
'cfg_num_outdegree': 3,
'cfg_num_scc': 3,
'cfg_size': 3,
'cfg_sum_loopintersize': 0,
'cfg_sum_loopsize': 0,
'cfg_sum_sccsize': 3,
'cg_num_callees': 2,
'cg_num_callers': 0,
'cg_num_imported_callees': 1,
'cg_num_imported_calls': 1,
'cg_num_incalls': 0,
'cg_num_outcalls': 2,
'data_avg_abs_strings': 0,
'data_avg_arg_type': 2,
'data_avg_consts': 144,
'data_avg_strlen': 0,
'data_mul_arg_type': 2,
'data_num_args': 1,
'data_num_consts': 1,
'data_num_strings': 0,
'data_ret_type': 2,
'data_sum_abs_strings': 0,
'data_sum_abs_strings_seq': 0,
'data_sum_arg_type': 2,
'data_sum_arg_type_seq': 2,
'data_sum_consts_seq': 144,
'data_sum_strlen': 0,
'data_sum_strlen_seq': 0,
'inst_avg_abs_arith': 0.6666666666666666,
'inst_avg_abs_ctransfer': 1.3333333333333333,
'inst_avg_abs_dtransfer': 4.666666666666667,
'inst_avg_arith': 0.6666666666666666,
'inst_avg_bitflag': 0.3333333333333333,
'inst_avg_cmp': 0.3333333333333333,
'inst_avg_cndctransfer': 0.3333333333333333,
'inst_avg_ctransfer': 1.0,
'inst_avg_dtransfer': 4.666666666666667,
'inst_avg_grp_call': 0.6666666666666666,
'inst_avg_grp_jump': 0.3333333333333333,
'inst_avg_grp_ret': 0.3333333333333333,
'inst_avg_logic': 0.3333333333333333,
'inst_avg_total': 7.333333333333333,
'inst_num_abs_arith': 2.0,
'inst_num_abs_ctransfer': 4.0,
'inst_num_abs_dtransfer': 14.0,
'inst_num_arith': 2.0,
'inst_num_bitflag': 1.0,
'inst_num_cmp': 1.0,
'inst_num_cndctransfer': 1.0,
'inst_num_ctransfer': 3.0,
'inst_num_dtransfer': 14.0,
'inst_num_grp_call': 2.0,
'inst_num_grp_jump': 1.0,
'inst_num_grp_ret': 1.0,
'inst_num_logic': 1.0,
'inst_num_total': 22
},
...
}
$ python helper/test_roc.py \
--input_list "example/input_list_find.txt" \
--train_funcs_limit 200000 \
--config "config/gnu/config_gnu_normal_all.yml"
For more details, please check example/
. All configuration files for our
experiments are in config/
. The time spent for running example/example.sh
took as below.
You can obtain below information after running test_roc.py
.
Features:
inst_num_abs_ctransfer (inter): 0.4749
inst_num_cmp (inter): 0.5500
inst_num_cndctransfer (inter): 0.5906
...
...
...
Avg \# of selected features: 13.0000
Avg. TP-TN Gap: 0.3866
Avg. TP-TN Gap of Grey: 0.4699
Avg. ROC: 0.9424
Std. of ROC: 0.0056
Avg. AP: 0.9453
Std. of AP: 0.0058
Avg. Train time: 30.4179
AVg. Test time: 1.4817
Avg. # of Train Pairs: 155437
Avg. # of Test Pairs: 17270
One may use BCSA for several tasks such as malware detectio, plagiarism detection, authorship identification, or vulnerability discovery.
You can take a look at this repo for an example of IoT vulnerability discovery.
We ran all our experiments on a server equipped with four Intel Xeon E7-8867v4 2.40 GHz CPUs (total 144 cores), 896 GB DDR4 RAM, and 4 TB SSD. We setup Ubuntu 16.04 with IDA Pro v6.95 on the server.
Currently, it works on IDA Pro v6.95 and v7.5 with Python 3.8.0 on the system.
This project has been conducted by the below authors at KAIST.
We would appreciate if you consider citing our paper when using BinKit.
@ARTICLE{kim:tse:2022,
author={Kim, Dongkwan and Kim, Eunsoo and Cha, Sang Kil and Son, Sooel and Kim, Yongdae},
journal={IEEE Transactions on Software Engineering},
title={Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned},
year={2022},
volume={},
number={},
pages={1-23},
doi={10.1109/TSE.2022.3187689}
}