Vector35 / AlgoProphet

MIT License
8 stars 1 forks source link

AlgoProphet dev issues #1

Open shinmao opened 2 years ago

shinmao commented 2 years ago

Introduction

Work in progress

Samples

DFT (Discrete Fourier Transform): A method which can convert a sequence of complex numbers to new sequence of complex numbers with same length. Two clues can be used to identify this algorithm,
First step is Euler's formula.
Second step is identification of complex numbers (which can be implemented with struct) reference

shinmao commented 2 years ago

Literature review

Summarization and Feedback for related literatures.
Some are for Algorithm Recovery, and some others are for Type information Recovery.

Recovery of High-Level IR of Algorithms from Binary code

Thinking about existing IR is not friendly for users to figure out the algorithm, they designed a hierarchical high level representation to help discover undocumented features of program.

OSPREY (S&P21)

Recovery of variables/data structures with probabilistic analysis on stripped binary, which means they synthesize a large collections of hints to guess the information. (I would like to create another comment to collect some useful hints, thanks to OSPREY)

DIRTY (USENIX22)

Augment Decompiler Output with Learned Variable Names and Types, not only recover type but also recover developer-friendly names.
Input: decompiled function tokens (from IDA) / Output: Recommend types and names for all variables included in the function

// Transformer-based NN model
Code encoder: each code piece including operands and operators
Data layout encoder: location (registers or stack), offset, size, and used to filter out impossible prediction results.
shinmao commented 2 years ago

8/16 working progress

Currently there are more than 10 models in AlgoProphet. But models are all generated with manual effort. Some other algorithm such as fourier transform might be difficult to identify just with isomorphism; therefore, might need to change matching algorithm.

Next plans

shinmao commented 2 years ago

8/30 - 9/13 working progress

shinmao commented 2 years ago

Hypotheses

In this month, we are exploring and developing a more friendly interface for users to generate and adjust their models. To generate the models, users can use mouse to select consecutive instructions from the BinaryView which they think are important for the algorithm. After generating the models, users can also adjust the models by interacting with the BinaryView. To make the graph matching algorithm which is used to find out the existing algorithm from the binaries more efficient, users can right-click on the BinaryView to prune the operation nodes, SSAVariables, or constants from the models generated previously. Compared to the existing methodologies which use function signatures to match the algorithms, our method provides more flexibility and possibilities. Additionally, it is also more reasonable for users to figure out why their models don’t work in some cases, and adjust their models interactively.

Plans for Next Month

galenbwill commented 2 years ago

8/30 - 9/13 working progress

  • [x] Rename variables after matching models will need to add attributes output to the node label the nodes with zero out-degree the idx of the nodes should be the instruction with left values of formula

I think renaming variables should not be automatically applied -- instead wrap it in commands:

  1. "AlgoProphet -- Rename all matched variables". Global command that does not appear in the right-click menu.
  2. "AlgoProphet -- Rename matched variables". Per-function command that only applies to current function, and does appear in the right-click menu.