Clustering and Visualising Documents using Word Embeddings
Resource type
External Resource
Authors, editors and contributors
Jonathan Reades, Jennie Williams, Alex Wermer-Colan, Quinn Dombrowski, Barbara McGillivray
Topics (keywords)
DH, Open Education, Open Access, data visualisation, machine learning, python, network analysis
Learning outcomes
After completing this lesson, you will be able to:
Appreciate the ‘curse of dimensionality’ and understand why it is important to text mining
Use (nonlinear) dimensionality reduction to reveal structure in corpora
Use hierarchical clustering to group similar documents within a corpus
Abstract
This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.
Title of the resource
Clustering and Visualising Documents using Word Embeddings
Resource type
External Resource
Authors, editors and contributors
Jonathan Reades, Jennie Williams, Alex Wermer-Colan, Quinn Dombrowski, Barbara McGillivray
Topics (keywords)
DH, Open Education, Open Access, data visualisation, machine learning, python, network analysis
Learning outcomes
After completing this lesson, you will be able to:
Abstract
This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.