The goal of the course is to help humanities researchers apply NLP (natural language processing) tools and methods on TEI-encoded texts, even though such tools are usually not natively made to work with XML.
Upon completion of this course, students will be able to:
TEI and the philological tradition of manual annotation. Digital editions vs. corpora. Yet: digital editions can benefit from NLP annotation: better search and retrieval, indexing, pattern recognition.
But how to do it? Question of scale. We can't do linguistic annotation manually - it would take for ever. But applying NLP tools is not easy because they're usually not made to work natively with XML.
There is a way forward: TEI is flexible.
Explain the differences, advantages and disadvantages of storing annotation in the text or separately from it.
Concrete examples.
A tool which lets you convert TEI datasets from inline to standoff and vice versa.
In this seciton, we'll take you step-by-step through the process of adding lingusitic annotations to TEI-ecnoded texts.
We provide a sample dataset.
A paragraph or so about the letter(s) we chose.
What is the end goal? What do we want the final TEI to look like? What attributes and elements are we going to use to annotate lemmas, POS, NER...