Open shinmao opened 2 years ago
Summarization and Feedback for related literatures.
Some are for Algorithm Recovery, and some others are for Type information Recovery.
Thinking about existing IR is not friendly for users to figure out the algorithm, they designed a hierarchical high level representation to help discover undocumented features of program.
Recovery of variables/data structures with probabilistic analysis on stripped binary, which means they synthesize a large collections of hints to guess the information. (I would like to create another comment to collect some useful hints, thanks to OSPREY)
Augment Decompiler Output with Learned Variable Names and Types, not only recover type but also recover developer-friendly names.
Input
: decompiled function tokens (from IDA) / Output
: Recommend types and names for all variables included in the function
// Transformer-based NN model
Code encoder: each code piece including operands and operators
Data layout encoder: location (registers or stack), offset, size, and used to filter out impossible prediction results.
struct
if focusing on it, but worse when working on all types
Online testing platformCurrently there are more than 10 models in AlgoProphet. But models are all generated with manual effort. Some other algorithm such as fourier transform might be difficult to identify just with isomorphism; therefore, might need to change matching algorithm.
sincos
cases
It would be required to identify DFT algorithm[x] Right click command to generate models based on consecutive instructions In this screenshot, we can highlight consecutive instructions (would only consider the data flow used in highlighted instructions) and build a model based on them
[x] Right click to match existing models in single function single function version of match algos in command platte In this screenshot, we can click on any places of the function to match it with existing models
[x] Right click on SSA variables or constants to adjust models
We can use UIActionContext
to capture the selected variables
RightClick menu needs PluginCommand
, but PluginCommand
cannot UIActionContext
Solution: We can also directly import UIContext
to get UIActionContext
Adding attributes of related operation
in dfg graph?
No! Due to the normalization, the operation node might be changed until the final graph generated
Solution: Do graph traversal to find the closest operation node
Challenge: the selected token sometimes doesn't appear in graph view, we might need to track dataflow
In the screenshot, we right-click on x0#3
and can remove the related operation of it.
[x] Rename variables after matching models
will need to add attributes output
to the node
label the nodes with zero out-degree
the idx
of the nodes should be the instruction with left values of formula
In this month, we are exploring and developing a more friendly interface for users to generate and adjust their models. To generate the models, users can use mouse to select consecutive instructions from the BinaryView which they think are important for the algorithm. After generating the models, users can also adjust the models by interacting with the BinaryView. To make the graph matching algorithm which is used to find out the existing algorithm from the binaries more efficient, users can right-click on the BinaryView to prune the operation nodes, SSAVariables, or constants from the models generated previously. Compared to the existing methodologies which use function signatures to match the algorithms, our method provides more flexibility and possibilities. Additionally, it is also more reasonable for users to figure out why their models don’t work in some cases, and adjust their models interactively.
8/30 - 9/13 working progress
- [x] Rename variables after matching models will need to add attributes
output
to the node label the nodes with zero out-degree theidx
of the nodes should be the instruction with left values of formula
I think renaming variables should not be automatically applied -- instead wrap it in commands:
Introduction
Work in progress
Samples
DFT (Discrete Fourier Transform): A method which can convert a sequence of complex numbers to new sequence of complex numbers with same length. Two clues can be used to identify this algorithm,
First step is Euler's formula.
Second step is identification of complex numbers (which can be implemented with
struct
) reference