avast / retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
https://retdec.com/
MIT License
7.89k stars 938 forks source link

Implement "Meaningful Variables Names for Decompiled Code A Machine Translation Approach" in retdec #34

Open ghost opened 6 years ago

ghost commented 6 years ago

See http://www.contrib.andrew.cmu.edu/~apjaffe/smt/decompilation-renaming-2.pdf

Quote from @apjaffe's abstract:

Decompiled code lacks meaningful variable names. We used statistical machine translation to suggest variable names that are natural given the context. This technique has previously been successfully applied to obfuscated JavaScript code, but decompiled C code poses unique challenges in constructing an aligned corpus and selecting the best translation from among several candidates.

In @apjaffe's paper, they use Hex-Rays Decompiler to generate training dataset and raw decompiled C code, with some effort the solution could be ported to RetDec as well.

s3rvac commented 6 years ago

Actually, RetDec already implements a lot of nifty tricks to produce variables with meaningful names. Some of them are: