Add an AST ngram extractor to the Aura framework

There is already an experimental ngram.py in the repository root that is able to extract n-gram features from the source code in the JSON format. This extractor needs to be finished & refactored to port the changes from the new Aura v2.

This extractor should be disabled by default as it would produce huge amounts of data that is not needed during a standard scan but can be enabled when collecting the dataset for the ML.

SourceCode-AI / aura

Add an AST ngram extractor to the Aura framework #1