Megan S. Kane, Maria Antoniak, William Mattingly, John R. Ladd
Topics (keywords)
DH, Open Education, Open Access, data manipulation, distant reading, python
Learning outcomes
After completing this lesson, you will be able to:
Upload a corpus of texts to a platform for Python analysis (using Google Colaboratory)
Use spaCy to enrich the corpus through tokenization, lemmatization, part-of-speech tagging, dependency parsing and chunking, and named entity recognition
Conduct frequency analyses using part-of-speech tags and named entities
Download an enriched dataset for use in future NLP analyses
Abstract
This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
HI @charlottejmc - I've sent you a query message re Megan S. Kane in that pull request, if you could take a look at that. Once we have that resolved this should be OK to continue to draft :)
Title of the resource
Corpus Analysis with spaCy
Resource type
External Resource
Authors, editors and contributors
Megan S. Kane, Maria Antoniak, William Mattingly, John R. Ladd
Topics (keywords)
DH, Open Education, Open Access, data manipulation, distant reading, python
Learning outcomes
After completing this lesson, you will be able to:
Abstract
This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.