Closed urialon closed 5 years ago
Hi,
I managed to get it to work by running objcopy --only-keep-debug a.out a.dbg
to get the debug info file, and strip -s a.out -o a.stripped
to get the stripped file.
Is it the correct usage?
First, the binary pair should be of the same file name but put into different folders. So you will have examples/stripped/a
and examples/debug/a
finally. In examples/bin_list.txt
, there should be a line a
to refer to the pair.
Second, you should keep symbol table for the "stripped" version. Symbol tables contain scope and name for functions. When training or evaluating prediction accuracy by py/evaluate.py
, we assume function scope is known for every binary. However, golden function names in symbol table are not used as extra information for prediction. With your command strip -s a.out -o a.stripped
, function scope is inferred by BAP, which may be imprecise. As a result, training sample labelling and accuracy measurement may be wrong.
Thanks for your quick reply. 1.Regarding file names - Yes, sure, I did it as in the example as you said.
Regarding symbol table - so what are the correct command lines?
Can you elaborate on function names? Why are they not used in evaluation and how is it relevant for the scope?
Thanks!
You should run strip -g a.out -o a.stripped
.
Sorry, what I wrote about this part was misleading. What I meant is that, when training or evaluating accuracy by py/evaluate.py
, we only assume function scope is known and golden function names in symbol table are not used as information for prediction. We of course need to compare predicted function names with golden function names to calculate accuracy. (I also edited the comment before in order not to be misleading)
Thanks!
Maybe you guys would want to add the objcopy
and strip -g
instructions to the README, for future reference. Thanks again
Yes, we will add those. Thanks for the suggestion.
Hi,
According to the content of Linux Symbol Packages
,I got the correct format non-stripped binaries. But from /usr/bin
or /bin
, the corresponding stripped binaries has no .symtab
. So how could I do to extract the .symtab
and add it to the stripped binaries?
Hi,
You can use ELFIO library to read those sections from debug information and add them to the stripped binaries.
Best, Jingxuan
Hi again, I have another question which I couldn't understand from the README: What should we do to prepare a binary for training? Assume that I have a binary that was compiled with debug symbols. In order to train on this binary, I need to have two versions of this file, in
--bin_dir examples/stripped/
and in--debug_dir examples/debug/
.What should I run on the binary to create each of the two versions? For example, I noticed that the version that is in `example/stripped' is not completely stripped.
Thanks!