Bergvca / string_grouper

Super Fast String Matching in Python
MIT License
364 stars 76 forks source link

how to handle 'ValueError: empty vocabulary; perhaps the documents only contain stop words' in group_similar_strings #66

Open gw00207 opened 3 years ago

gw00207 commented 3 years ago

currently I am having to use a try/except clause when using group_similar_strings in case all of the strings only contain stopwords. Is it possible to handle this case differently, e.g. just return all strings ungrouped? or perhaps just a more descriptive error so that I can except and handle OnlyStopwordsError or similar instead of any ValueError. great package, many thanks.

ParticularMiner commented 3 years ago

That makes sense, @gw00207 and is a simple enough addition to make. Can you create a pull request for this?

gw00207 commented 3 years ago

please see https://github.com/Bergvca/string_grouper/pull/67