Open DruidSmith opened 8 years ago
You are more than right about the custom lists. At the moment there is nothing like that bu could be added easily. I'll find some time to do it. Thanks for your suggestion ;)
I should make this easier, but you could find the path to stop-words files and create a file named stop-custom. After that you should only set the language to custom
when initialising Normalizr:
from normalizr import Normalizr
normalizr = Normalizr(language='custom')
I'm leaving this issue open till I decide what to do ;)
Thanks, will give it a try.
Adding my 2 cents...
I don't have a use for my own custom stopword list, but it would be nice to be able to add words to the stop list with the normalization settings. IE.
normalizations = [
'remove_extra_whitespaces',
'remove_stop_words',
('add_stopwords', [ 'stopword_1', 'stopword_2', 'stopword_3', 'stopword_4', ])
]
Makes sense. I'll think about it and come up with something. I still have to find some time to implement changes in cucco (really needed ones).
EDIT: Thank you btw!
There isn't much documentation on how to use the stop-words list - and would it make sense to add the capability to use a custom stop-word list rather than having to modify an existing one? Or does that capability already exist?