D-Lab's 12 hour introduction to text analysis with Python. Learn how to perform bag-of-words, sentiment analysis, topic modeling, word embeddings, and more, using scikit-learn, NLTK, Gensim, and spaCy in Python.
The 20 newsgroups dataset should be shrunk before being analyzed (as done in the sklearn tutorial). Right now, it uses the entire dataset, which has some nonsense entries that skew the topic models.
The 20 newsgroups dataset should be shrunk before being analyzed (as done in the sklearn tutorial). Right now, it uses the entire dataset, which has some nonsense entries that skew the topic models.