There is already an experimental ngram.py in the repository root that is able to extract n-gram features from the source code in the JSON format. This extractor needs to be finished & refactored to port the changes from the new Aura v2.
This extractor should be disabled by default as it would produce huge amounts of data that is not needed during a standard scan but can be enabled when collecting the dataset for the ML.
There is already an experimental
ngram.py
in the repository root that is able to extract n-gram features from the source code in the JSON format. This extractor needs to be finished & refactored to port the changes from the new Aura v2.This extractor should be disabled by default as it would produce huge amounts of data that is not needed during a standard scan but can be enabled when collecting the dataset for the ML.