Closed Conal-Tuohy closed 5 months ago
Good catch. In fact, the prose in the rest of this section (up to “Contextual Information”) needs work, too.
One could mention in this section the reverse approach that several corpora have used, whereby each corpus document includes each header, from each level (main corpus, subcorpus, maybe even sub-subcorpus), and becomes a well-described free-standing object. An example of that can be seen at, e.g., http://nlp.ipipan.waw.pl/TEI4NKJP/example_all_levels_1M/text.xml
<teiCorpus xmlns:xi="http://www.w3.org/2001/XInclude" xmlns="http://www.tei-c.org/ns/1.0">
<xi:include href="NKJP_1M_header.xml"/>
<TEI>
<xi:include href="header.xml"/>
<text xml:id="txt_text" xml:lang="pl">
<body xml:id="txt_body">
This way, there is no fear that a tool that attempts to read the root corpus document (with XInclusions) chokes on gigabytes of text pulled in for the individual subcopora and documents.
@Conal-Tuohy — Created PR, but do not seem to be able to add you as reviewer. Would you mind taking a look?
@Conal-Tuohy — Created PR, but do not seem to be able to add you as reviewer. Would you mind taking a look?
It looks good to me, @sydb !
The text of the chapter appears to assume that the
teiCorpus
element cannot containteiCorpus
elements, e.g.https://github.com/TEIC/TEI/blob/dev/P5/Source/Guidelines/en/CC-LanguageCorpora.xml#L178C1-L189C63