Bergvca / string_grouper

Super Fast String Matching in Python
MIT License
364 stars 76 forks source link

Question / suggestion to use multiple n-grams to get more features #76

Open iibarant opened 2 years ago

iibarant commented 2 years ago

Hi @Bergvca and @ParticularMiner,

Hope you are doing good.

I got to work on the same project again and have a question / suggestion - would it be possible to use multiple n-grams to get more features? Like currently we have the following - ngram_size: The amount of characters in each n-gram. Default is 3.

What if we get n-grams in a list like [2,3,4] and get more vector components - ngrams=2 plus ngrams=3 and ngrams=4?

What do you think?

By the way, the string_grouper approach is really good in terms of speed and efficiency. Great work!

Thank you, iibarant