[new resource]: Clustering and Visualising Documents using Word Embeddings

Title of the resource

Clustering and Visualising Documents using Word Embeddings

Resource type

External Resource

Authors, editors and contributors

Jonathan Reades, Jennie Williams, Alex Wermer-Colan, Quinn Dombrowski, Barbara McGillivray

Topics (keywords)

DH, Open Education, Open Access, data visualisation, machine learning, python, network analysis

Learning outcomes

After completing this lesson, you will be able to:

Appreciate the ‘curse of dimensionality’ and understand why it is important to text mining
Use (nonlinear) dimensionality reduction to reveal structure in corpora
Use hierarchical clustering to group similar documents within a corpus

Abstract

This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.

DARIAH-ERIC / dariah-campus