SourceCode-AI / aura

Python source code auditing and static analysis on a large scale
GNU General Public License v3.0
486 stars 31 forks source link

Add an AST ngram extractor to the Aura framework #1

Open RootLUG opened 3 years ago

RootLUG commented 3 years ago

There is already an experimental ngram.py in the repository root that is able to extract n-gram features from the source code in the JSON format. This extractor needs to be finished & refactored to port the changes from the new Aura v2.

This extractor should be disabled by default as it would produce huge amounts of data that is not needed during a standard scan but can be enabled when collecting the dataset for the ML.