Note: This corpus is still in beta status.
The English Drama Corpus (EngDraCor) provides TEI documents which have been generated from a selection of dramatic works out of the EarlyPrint.org collection. This selection is being maintained in the engdracor-sources repository.
The following modifications to the original documents have been made:
<tei:w>
) and punctuation (<tei:pc>
) has been removedThe tally of dramatic works in the Early Print corpus, as provided by its editors, amounted to 853 texts.
In this initial phase, we set aside the 363 texts that lack speaker identification with who
attributes in the original markup.
From the remaining texts, we proceed to filter out 73 items which:
The remaining 433 plays constitute the first version of EngDraCor.
The XSLT workflow depends on the following tools
To update the entire corpus from the sources run the the ep2dracor
script like
this (assuming you have cloned the engdracor-sources
repo to the same parent
directory as engdracor
):
./ep2dracor ../engdracor-sources/xml/*.xml
You can also update individual files, for instance:
./ep2dracor ../engdracor-sources/xml/A17872.xml
engdracor-sources
repo, see
https://github.com/dracor-org/engdracor-sources#how-to-add-or-remove-plays./ep2dracor ../engdracor-sources/xml/*.xml
For scripting or reporting purposes you may want to obtain a simple list of plays included in the corpus. There is a stylesheet to generate such lists from the index.xml file.
# convert index.xml to CSV
saxon -s:index.xml -xsl:list.xsl
# list all DraCor IDs
saxon -s:index.xml -xsl:list.xsl type=id
# list all DraCor slugs
saxon -s:index.xml -xsl:list.xsl type=slug
# list all original EarlyPrint IDs
saxon -s:index.xml -xsl:list.xsl type=sourceid
# list only "vanilla selection"
saxon -s:index.xml -xsl:list.xsl type=slug vanilla=yes
The EngDraCor TEI files are licenced under the Creative Commons Attribution-NonCommercial 3.0 Unported license (CC BY-NC 3.0).