Replace nltk based bayes model with sklearn.
This also changes parameters in the model so we limit its vocabulary by 5000 tokens.
This is most likely the reason why the nltk bayes model blew up
With new model and vectorizer, it's about 25mb. Is that ok?
I also think the scripts should be rewritten for a different pr. I see clearly that I have learned a lot about readable and idiomatic python code
since i wrote this lol
Replace nltk based bayes model with sklearn. This also changes parameters in the model so we limit its vocabulary by 5000 tokens. This is most likely the reason why the nltk bayes model blew up With new model and vectorizer, it's about 25mb. Is that ok?
I also think the scripts should be rewritten for a different pr. I see clearly that I have learned a lot about readable and idiomatic python code since i wrote this lol
This closes issue #35