Closed GusRiva closed 2 years ago
@GusRiva : it looks like that some tei files were not generated during transform workflows at some point. It might be an already fixed problem though. I suggest triggering manually for now, and see if it happens again in the future.
Now, we test to see if the transformation is effective. I'm also adding tests to check for TEI/schema validity.
I recreated all the TEI files again, so hopefully that will be ok.
However, we have this issue in the tests. It looks like a package needs to be installed. I assume here?
@Jean-Baptiste-Camps
Edit: I installed the package, but now there is another error because of an unrecognised function.
I think we would also need to update the schema to add the values completeWits, sourceText and derivatives to the type attribute for term.
A lot of useful validation errors in the last run! I'll correct the scripts and the metadata files accordingly.
It has passed all the checks!!! :partying_face:
Important things to remember:
Thing that would need to be updated in the schema:
I'm modifying the ODD but, a few remarks:
respStmt
is already repeatable (so no need for a modif on that`)<respStmt>
<resp>contributed to OpenStemmata by</resp>
<persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName>
</respStmt>
<respStmt>
<resp>contributed to OpenStemmata by</resp>
<persName ref="">Jane Doe</persName>
</respStmt>
bibl
is enforced. Why is that ? We could keep it or make it indifferent:<content>
<sequence preserveOrder="true">
<elementRef key="title" minOccurs="1" maxOccurs="1"/>
<elementRef key="date" minOccurs="1" maxOccurs="1"/>
<elementRef key="pubPlace" minOccurs="1" maxOccurs="1"/>
<elementRef key="series" minOccurs="1" maxOccurs="1"/>
<elementRef key="biblScope" minOccurs="1" maxOccurs="1"/>
<elementRef key="author" minOccurs="1" maxOccurs="1"/>
<elementRef key="biblScope" minOccurs="1" maxOccurs="1"/>
<elementRef key="ptr" minOccurs="1" maxOccurs="1"/>
</sequence>
</content>
repository
and settlement
, how do you propose we extract them from the vague signature
field ? Ideally, we could like to have a complete msIdentifier
, but that would require to change the form.keyword
? Or is it too specific ?
respStmt
is already repeatable (so no need for a modif on that`)<respStmt> <resp>contributed to OpenStemmata by</resp> <persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName> </respStmt> <respStmt> <resp>contributed to OpenStemmata by</resp> <persName ref="">Jane Doe</persName> </respStmt>
Oh, ok! I was making only one respStmt with many persons like this:
<respStmt>
<resp>contributed to OpenStemmata by</resp>
<persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName>
<persName ref="">Jane Doe</persName>
</respStmt>
, but I can also do it with one respStmt pro person, and that is probably how the TEI conceived it.
- For now, the order of components in
bibl
is enforced. Why is that ? We could keep it or make it indifferent:
I don't see a reason to keep it in order. And I think biblScope
shouldn't be two times, but with maxOccus="2". Also author and pubPlace could be repeated, so I would suggest this:
<content>
<sequence preserveOrder="false">
<elementRef key="title" minOccurs="1" maxOccurs="1"/>
<elementRef key="author" minOccurs="1" maxOccurs="unbounded/>
<elementRef key="date" minOccurs="1" maxOccurs="1"/>
<elementRef key="pubPlace" minOccurs="1" maxOccurs="unbounded"/>
<elementRef key="series" minOccurs="1" maxOccurs="1"/>
<elementRef key="biblScope" minOccurs="1" maxOccurs="2"/>
<elementRef key="ptr" minOccurs="1" maxOccurs="1"/>
</sequence>
</content>
- for
repository
andsettlement
, how do you propose we extract them from the vaguesignature
field ? Ideally, we could like to have a completemsIdentifier
, but that would require to change the form.
The script already does this here 😉 . If we have a nice comma-separated signature it creates settlement, repository, idno; otherwise just an idno. It works very well for something like "Vienna, Österr. Nationalbibl., Cod. 2881". I always try to do my signatures this way. Should we make it a suggestion in the documentation?
- Should we, in the schema, define the list and exact types of allowed keywords inside keyword ? Or is it too specific ?
The more specific the better, as there is always a possibility that someone might write something wrong in the txt file, but it's not absolutely necessary. If it is not too difficult, maybe we should do it.
Things I need to update/check in the transformation script:
Ok, I made the modifications to the schema !
For witness
, the new content model is:
<content>
<sequence preserveOrder="true">
<elementRef key="label" minOccurs="1" maxOccurs="1"/>
<alternate minOccurs="1" maxOccurs="1">
<elementRef key="idno" minOccurs="1" maxOccurs="1"/>
<elementRef key="msDesc" minOccurs="1" maxOccurs="1"/>
</alternate>
<elementRef key="origDate" minOccurs="1" maxOccurs="1"/>
<elementRef key="origPlace" minOccurs="1" maxOccurs="1"/>
<elementRef key="note" minOccurs="1" maxOccurs="1"/>
<elementRef key="ptr" minOccurs="2" maxOccurs="2"/>
</sequence>
</content>
Corresponding to this potential structure, as per the Guidelines:
<witness>
<label></label>
<msDesc>
<msIdentifier>
<settlement></settlement>
<repository></repository>
<idno></idno>
</msIdentifier>
</msDesc>
</witness>
For the keywords
, on second thoughts, I think this would require schematron validation, to check the exact order and attributes of term
. To keep it simpler, I just modified the controlled values of term/@type
:
<attList>
<attDef ident="type" mode="change" usage="req">
<valList type="closed" mode="add">
<valItem ident="workGenre"><desc>genre of the work (e.g., epic poetry, tragedy, etc.)</desc></valItem>
<valItem ident="stemmaType"><desc>type of tree (reconstructed or observed)</desc></valItem>
<valItem ident="drawnStemma"><desc>drawn vs. prose genealogy</desc></valItem>
<valItem ident="contam"><desc>contamination</desc></valItem>
<valItem ident="extraStemmContam"><desc>extra-stemmatic contamination</desc></valItem>
<valItem ident="rootType"><desc>type of root (original, archetype)</desc></valItem>
<valItem ident="completeWits"><desc>completion regarding witnesses</desc></valItem>
<valItem ident="sourceText"><desc>completion regarding source text(s)</desc></valItem>
<valItem ident="derivatives"><desc>completion regarding derivatives</desc></valItem>
</valList>
</attDef>
</attList>
Should we make it a suggestion in the documentation?
Proposed in https://github.com/OpenStemmata/OpenStemmata.github.io/pull/8
Basically, once these changes are approved, I think the only remaining thing to do is modify the transformation, and we will be good to go, and ready to scale up on a clean slate !
There is only one little thing I would like to do before we merge. Aurélien did a pretty cool thing in his submission: he added comments to the DOT file. See #98 and I mentioned this before here
For example:
other_source -> epitome [style="dashed"]; # "and so it may be wiser to believe that [A] was the main but not the only source of Epit[ome]"
Eustathius_ms [color="grey", label="Eustathius’s MS"]; # Eustathius’s manuscript source for the Epitome
I think those comments specific to one node or connection could be very useful. Right now we have the general notes where we include a lot of possible comments about anything. Then we have the notes for each witness. We don't have a way of saying something about an edge or a hypothetical node outside the general notes.
I would like to embrace this idea of commenting on nodes and edges directly. I would suggest we take these comments in the TEI and GraphML as notes that target specific nodes. In the GraphML is easy, I will include a new field "note" for nodes or edges.
In the TEI, I suggest we create a noteGrp
in the back
and include a @target
attribute to the node or edge in question. For example in the above:
<body>
...
<node xml:id="n_Eustathius_ms" type="hypothetical">
<label>Eustathius's MS</label>
</node>
...
<arc xml:id="a_1" from="#n_other_source" to="#n_epitome" od:type="contamination" cert="unknown" />
</body>
<back>
<noteGrp>
<note target="#n_Eustathius_ms">Eustathius’s manuscript source for the Epitome</note>
<note target="#a1">"and so it may be wiser to believe that [A] was the main but not the only source of Epit[ome]"</note>
</noteGrp>
</back>
@Jean-Baptiste-Camps What do you think? If you add the noteGrp to the schema I can take care of updating the transformation. I should be able to do this on Monday.
(Btw I just realised you also included these kinds of comments; for example in Bédier's Tristan)
It's done ! I haven't modified noteGrp
content model, but limited back
to:
<elementSpec ident="back" mode="change">
<content>
<elementRef key="noteGrp" minOccurs="0" maxOccurs="unbounded"/>
</content>
</elementSpec>
100th action on our repo, and a big one indeed ! Corrected minor namespace issue on schema (imports), let's see what validation yields.
100th action on our repo, and a big one indeed ! Corrected minor namespace issue on schema (imports), let's see what validation yields.
I was wondering what that error was. I made many minor improvement in the transformation and recreated the TEI files. When we get the tests to pass we should be ready to merge into main!
The tests are now validating against the schema, right? That is great! Could we also get the file name in the message error so that they are easier to find?
Done.
Ok, it's finally done! It was a good opportunity to go back to some of the first contributions, which still had not been completely updated to the new data model and to fix some bugs in the scripts. @Jean-Baptiste-Camps feel free to merge into main and we can start working on new stemmata with a brand new working pipeline.
I just realised, we might get a conflict with the file changed in #101 In that case, keep the changes from main
Great ! Hurrah us !
Added drawnStemmata property to rng and transformation to TEI. Added optional image for transformation to TEI.