Closed chekoduadarsh closed 2 years ago
Hi @chekoduadarsh 👍 Thanks for your comments and inputs. They are very interesting.
Isn't it better to use NLTK stop words list??
Sorry no. I found that it is mostly useless. That's why I created my own list by taking some from it and adding others from my own experience. I will add a couple of words you suggested however.
I think it is better if we lemmatize the data
Yes. Let me work on it. Look for it in the next version. Just do:
pip install autoviz --upgrade
in the next day or so.
Thanks again,
AutoViz
@AutoViML, Thank you,
let me know if u need support from me for the second point. I will be happy to do a PR.
@chekoduadarsh 👍 Thank you for your offer. May be next time, you can make a PR. I already committed the change. Can you please test it? Just upgrade... Thanks AutoViML
Ok Thank you I will upgrade autoviz
1. Updating Stopwords List
Currently, I can see that Stopwords are defined as a list and I can see that it is missing a few stop words like "themselves".
Isn't it better to use NLTK stop words list??
Copied from: https://gist.github.com/sebleier/554280
2. Lemmatization before plotting
I think it is better if we lemmatize the data before we plot then words like "reads", "reading" will count as the same, which will give us a better word cloud.