Open marcolarosa opened 2 years ago
I have a draft XSLT from a while back for stitching the surface
files together. It's not complete, but I think it needs only a little extra debugging work, to do that part.
Also the TEI file has a metadata header (a teiHeader
element) which should be populated with inputs from other sources; the original source file (before it was split into surfaces
), the describo database, the RO-Crate file, or wherever. The teiHeader
is able to store a tonne of different metadata, but its absolute minimum requirements are:
fileDesc
(description of the TEI file) which contains at least:
titleStmt
(title statement) which contains at least:
title
for the filepublicationStmt
(publication statement) which describes the publisher of the file (could be as simple as 'Nyingarn')sourceDesc
(a description / citation of the original source document which the TEI is a transcription of)Any metadata we have which corresponds to those elements should get inserted into the teiHeader
.
There are two stylesheets which reconstitute a full TEI file from the <surface>
elements contained in the "stub" XML files, and a <teiHeader>
header element copied from the originally-ingested file (if the original file was TEI) or from metadata encoded in the digivol CSV (if the uploaded file was a digivol CSV):
@Conal-Tuohy Can we make this one file?
Minimum viable metadata list as per comment 5/7.
@Conal-Tuohy This is actually in reference (as you thought) to being able to download a valid TEI document for a whole item. That is, containing all of the page surfaces and TEI header.
<milestone unit="text"/>
into a <group>
of <text>
elementsSo this is working but I still have a bug which affects the reconstitution of only documents with a complex structure. These are documents which had a hierarchy which cross-cut their pagination (i.e. containing logical structures which did not fall entirely within a single page). The ingestion stylesheet splits those sections at the page boundaries, and the bug in the re-assembly stylesheet means that those sections are not rejoined. I'm going to issue a PR anyway, since this is only a limitation rather than a blocker, and work on debugging on it while @marcolarosa can work on integrating the reassembly with the user's workflow.
Downloading the stub files should return a valid TEI document that is constructed on the fly.