I got to work on the same project again and have a question / suggestion - would it be possible to use multiple n-grams to get more features? Like currently we have the following - ngram_size: The amount of characters in each n-gram. Default is 3.
What if we get n-grams in a list like [2,3,4] and get more vector components - ngrams=2 plus ngrams=3 and ngrams=4?
What do you think?
By the way, the string_grouper approach is really good in terms of speed and efficiency. Great work!
Hi @Bergvca and @ParticularMiner,
Hope you are doing good.
I got to work on the same project again and have a question / suggestion - would it be possible to use multiple n-grams to get more features? Like currently we have the following - ngram_size: The amount of characters in each n-gram. Default is 3.
What if we get n-grams in a list like [2,3,4] and get more vector components - ngrams=2 plus ngrams=3 and ngrams=4?
What do you think?
By the way, the string_grouper approach is really good in terms of speed and efficiency. Great work!
Thank you, iibarant