(2) Extract features from ASTs

redouane-dziri commented 4 years ago

Will leverage existing tools to parse code into ASTs and extract properties of the trees to make additional features. Will add details to this as I look into 1) existing tools, 2) useful properties of ASTs, 3) how to leverage the tools with reasonable complexity.

By the end of this, should be able to generate additional features and explore them with regards to our classification pb.

Hoping this would make our models more robust to spurious choice of words (same function names across all files in a library, or in a crypto competiton (a lot of competitions enforce interfaces to test the solutions easily , e.g. encrypt decrypt functions are giveaways, especially problematic when 10% of our positive examples have them)

redouane-dziri commented 4 years ago

suggested reading by @Hadrien-Cornier for future reference: https://github.com/phanein/deepwalk

redouane-dziri commented 4 years ago

Relating to 1), following #36 very closely

arthurherbout / crypto_code_detection

(2) Extract features from ASTs #27