davidmogar / cucco

Text normalization library for Python
MIT License
203 stars 27 forks source link

Stop Words suggestion #5

Open DruidSmith opened 8 years ago

DruidSmith commented 8 years ago

There isn't much documentation on how to use the stop-words list - and would it make sense to add the capability to use a custom stop-word list rather than having to modify an existing one? Or does that capability already exist?

davidmogar commented 8 years ago

You are more than right about the custom lists. At the moment there is nothing like that bu could be added easily. I'll find some time to do it. Thanks for your suggestion ;)

davidmogar commented 8 years ago

I should make this easier, but you could find the path to stop-words files and create a file named stop-custom. After that you should only set the language to custom when initialising Normalizr:

from normalizr import Normalizr

normalizr = Normalizr(language='custom')

I'm leaving this issue open till I decide what to do ;)

DruidSmith commented 8 years ago

Thanks, will give it a try.

JasonCrowe commented 6 years ago

Adding my 2 cents...

I don't have a use for my own custom stopword list, but it would be nice to be able to add words to the stop list with the normalization settings. IE.

normalizations = [
    'remove_extra_whitespaces',
    'remove_stop_words',
    ('add_stopwords', [ 'stopword_1',  'stopword_2',  'stopword_3',  'stopword_4', ])
]
davidmogar commented 6 years ago

Makes sense. I'll think about it and come up with something. I still have to find some time to implement changes in cucco (really needed ones).

EDIT: Thank you btw!