dracor-org / dracor-schema

ODD and schemas for dracor.org files
https://dracor.org/doc/odd
5 stars 2 forks source link

schema for corpus.xml #44

Open ingoboerner opened 1 year ago

ingoboerner commented 1 year ago

The file corpus.xml in each (all?) github repository seems to be required. It is undocumented, and, does not validate against tei-all. We should look at that.

Fehlerlevel: error
Beschreibung: element "fileDesc" incomplete; missing required element "sourceDesc"

Fehlerlevel: error
Beschreibung: element "teiCorpus" incomplete; expected element "TEI", "facsimile", "fsdDecl", "sourceDoc", "standOff", "teiCorpus" or "text"

We don't need the xInclude namespace declared on root.

ingoboerner commented 1 month ago

See related issue #68 (easy to fix) BUT:

We would have to rework the corpus.xml:

Currently, it includes only a <teiHeader> but to be valid, it MUST include one of the elements: <TEI>, <standOff>, <teiCorpus> or <text>. I could adapt the content model but then we would have a modification that results in the file not validating against tei-all.

An option would be to include references to the individual TEI files of the plays. We have the xInclude (which I don't really like b/c causes problems with the exist, at least, used to..)

The second major problem: The <teiHeader> in the <teiCorpus> needs to include a <sourceDesc>. We could list the individual corpus' sources here, e.g. Textgrid, ... I looked into this for the CLSINFRA deliverable D7.3; we could aggregate the sources from the individual TEI files: See https://versioning-living-corpora.clsinfra.io/3-2_gerdracor_corpus_archeology.html 5a00205813c84cfe592dc974ac97825ec257c567e981f9286461149ac6e35c51

The other things, e.g. additional idno type values I can fix (allow) on the schema level.