OpenStemmata / database

An open database of stemmata
Creative Commons Attribution Share Alike 4.0 International
13 stars 12 forks source link

drawnStemmata and optional image #100

Closed GusRiva closed 2 years ago

GusRiva commented 2 years ago

Added drawnStemmata property to rng and transformation to TEI. Added optional image for transformation to TEI.

Jean-Baptiste-Camps commented 2 years ago

@GusRiva : it looks like that some tei files were not generated during transform workflows at some point. It might be an already fixed problem though. I suggest triggering manually for now, and see if it happens again in the future.

Now, we test to see if the transformation is effective. I'm also adding tests to check for TEI/schema validity.

GusRiva commented 2 years ago

I recreated all the TEI files again, so hopefully that will be ok.

However, we have this issue in the tests. It looks like a package needs to be installed. I assume here?

@Jean-Baptiste-Camps

Edit: I installed the package, but now there is another error because of an unrecognised function.

GusRiva commented 2 years ago

I think we would also need to update the schema to add the values completeWits, sourceText and derivatives to the type attribute for term.

GusRiva commented 2 years ago

A lot of useful validation errors in the last run! I'll correct the scripts and the metadata files accordingly.

GusRiva commented 2 years ago

It has passed all the checks!!! :partying_face:

Important things to remember:

GusRiva commented 2 years ago

Thing that would need to be updated in the schema:

Jean-Baptiste-Camps commented 2 years ago

I'm modifying the ODD but, a few remarks:

  1. respStmt is already repeatable (so no need for a modif on that`)
<respStmt>
  <resp>contributed to OpenStemmata by</resp>
  <persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName>
</respStmt>
<respStmt>
  <resp>contributed to OpenStemmata by</resp>
  <persName ref="">Jane Doe</persName>
</respStmt>
  1. For now, the order of components in bibl is enforced. Why is that ? We could keep it or make it indifferent:
<content> 
  <sequence preserveOrder="true"> 
     <elementRef key="title"  minOccurs="1" maxOccurs="1"/> 
     <elementRef key="date" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="pubPlace" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="series" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="biblScope" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="author" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="biblScope" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="ptr" minOccurs="1" maxOccurs="1"/> 
</sequence> 
</content>
  1. for repository and settlement, how do you propose we extract them from the vague signature field ? Ideally, we could like to have a complete msIdentifier, but that would require to change the form.
Jean-Baptiste-Camps commented 2 years ago
  1. Should we, in the schema, define the list and exact types of allowed keywords inside keyword ? Or is it too specific ?
GusRiva commented 2 years ago
  1. respStmt is already repeatable (so no need for a modif on that`)
<respStmt>
  <resp>contributed to OpenStemmata by</resp>
  <persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName>
</respStmt>
<respStmt>
  <resp>contributed to OpenStemmata by</resp>
  <persName ref="">Jane Doe</persName>
</respStmt>

Oh, ok! I was making only one respStmt with many persons like this:

<respStmt>
  <resp>contributed to OpenStemmata by</resp>
  <persName ref="http://orcid.org/0000-0003-0385-7037">Jean-Baptiste Camps</persName>
  <persName ref="">Jane Doe</persName>
</respStmt>

, but I can also do it with one respStmt pro person, and that is probably how the TEI conceived it.

  1. For now, the order of components in bibl is enforced. Why is that ? We could keep it or make it indifferent:

I don't see a reason to keep it in order. And I think biblScopeshouldn't be two times, but with maxOccus="2". Also author and pubPlace could be repeated, so I would suggest this:

<content> 
  <sequence preserveOrder="false"> 
     <elementRef key="title"  minOccurs="1" maxOccurs="1"/> 
    <elementRef key="author" minOccurs="1" maxOccurs="unbounded/> 
     <elementRef key="date" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="pubPlace" minOccurs="1" maxOccurs="unbounded"/> 
     <elementRef key="series" minOccurs="1" maxOccurs="1"/> 
     <elementRef key="biblScope" minOccurs="1" maxOccurs="2"/> 
     <elementRef key="ptr" minOccurs="1" maxOccurs="1"/> 
</sequence> 
</content>
  1. for repository and settlement, how do you propose we extract them from the vague signature field ? Ideally, we could like to have a complete msIdentifier, but that would require to change the form.

The script already does this here 😉 . If we have a nice comma-separated signature it creates settlement, repository, idno; otherwise just an idno. It works very well for something like "Vienna, Österr. Nationalbibl., Cod. 2881". I always try to do my signatures this way. Should we make it a suggestion in the documentation?

  1. Should we, in the schema, define the list and exact types of allowed keywords inside keyword ? Or is it too specific ?

The more specific the better, as there is always a possibility that someone might write something wrong in the txt file, but it's not absolutely necessary. If it is not too difficult, maybe we should do it.

GusRiva commented 2 years ago

Things I need to update/check in the transformation script:

Jean-Baptiste-Camps commented 2 years ago

Ok, I made the modifications to the schema !

For witness, the new content model is:

<content> 
  <sequence preserveOrder="true"> 
    <elementRef key="label" minOccurs="1" maxOccurs="1"/> 
    <alternate minOccurs="1" maxOccurs="1">
      <elementRef key="idno" minOccurs="1" maxOccurs="1"/>
      <elementRef key="msDesc" minOccurs="1" maxOccurs="1"/>
    </alternate> 
    <elementRef key="origDate" minOccurs="1" maxOccurs="1"/> 
    <elementRef key="origPlace" minOccurs="1" maxOccurs="1"/> 
    <elementRef key="note" minOccurs="1" maxOccurs="1"/>
    <elementRef key="ptr" minOccurs="2" maxOccurs="2"/>
  </sequence>
</content>

Corresponding to this potential structure, as per the Guidelines:

 <witness>
  <label></label>
  <msDesc>
    <msIdentifier>
      <settlement></settlement>
      <repository></repository>
      <idno></idno>
    </msIdentifier>
  </msDesc>
</witness>

For the keywords, on second thoughts, I think this would require schematron validation, to check the exact order and attributes of term. To keep it simpler, I just modified the controlled values of term/@type:

<attList>
  <attDef ident="type" mode="change" usage="req">
    <valList type="closed" mode="add">
      <valItem ident="workGenre"><desc>genre of the work (e.g., epic poetry, tragedy, etc.)</desc></valItem>
      <valItem ident="stemmaType"><desc>type of tree (reconstructed or observed)</desc></valItem>
      <valItem ident="drawnStemma"><desc>drawn vs. prose genealogy</desc></valItem>
      <valItem ident="contam"><desc>contamination</desc></valItem>
      <valItem ident="extraStemmContam"><desc>extra-stemmatic contamination</desc></valItem>
      <valItem ident="rootType"><desc>type of root (original, archetype)</desc></valItem>
      <valItem ident="completeWits"><desc>completion regarding witnesses</desc></valItem>
      <valItem ident="sourceText"><desc>completion regarding source text(s)</desc></valItem>
      <valItem ident="derivatives"><desc>completion regarding derivatives</desc></valItem>
    </valList>
  </attDef>
</attList>
Jean-Baptiste-Camps commented 2 years ago

Should we make it a suggestion in the documentation?

Proposed in https://github.com/OpenStemmata/OpenStemmata.github.io/pull/8

Jean-Baptiste-Camps commented 2 years ago

Basically, once these changes are approved, I think the only remaining thing to do is modify the transformation, and we will be good to go, and ready to scale up on a clean slate !

GusRiva commented 2 years ago

There is only one little thing I would like to do before we merge. Aurélien did a pretty cool thing in his submission: he added comments to the DOT file. See #98 and I mentioned this before here

For example:

other_source -> epitome [style="dashed"]; # "and so it may be wiser to believe that [A] was the main but not the only source of Epit[ome]"

Eustathius_ms [color="grey", label="Eustathius’s MS"]; # Eustathius’s manuscript source for the Epitome

I think those comments specific to one node or connection could be very useful. Right now we have the general notes where we include a lot of possible comments about anything. Then we have the notes for each witness. We don't have a way of saying something about an edge or a hypothetical node outside the general notes.

I would like to embrace this idea of commenting on nodes and edges directly. I would suggest we take these comments in the TEI and GraphML as notes that target specific nodes. In the GraphML is easy, I will include a new field "note" for nodes or edges.

In the TEI, I suggest we create a noteGrpin the backand include a @target attribute to the node or edge in question. For example in the above:

<body>
         ...
         <node xml:id="n_Eustathius_ms" type="hypothetical">
             <label>Eustathius's MS</label>
         </node>
         ...
          <arc xml:id="a_1" from="#n_other_source" to="#n_epitome" od:type="contamination" cert="unknown" />
</body>
<back>
    <noteGrp>
        <note target="#n_Eustathius_ms">Eustathius’s manuscript source for the Epitome</note>
        <note target="#a1">"and so it may be wiser to believe that [A] was the main but not the only source of Epit[ome]"</note>
    </noteGrp>
</back>

@Jean-Baptiste-Camps What do you think? If you add the noteGrp to the schema I can take care of updating the transformation. I should be able to do this on Monday.

GusRiva commented 2 years ago

(Btw I just realised you also included these kinds of comments; for example in Bédier's Tristan)

Jean-Baptiste-Camps commented 2 years ago

It's done ! I haven't modified noteGrp content model, but limited back to:

<elementSpec ident="back" mode="change">
  <content>
    <elementRef key="noteGrp" minOccurs="0" maxOccurs="unbounded"/>
  </content>
</elementSpec>
Jean-Baptiste-Camps commented 2 years ago

100th action on our repo, and a big one indeed ! Corrected minor namespace issue on schema (imports), let's see what validation yields.

GusRiva commented 2 years ago

100th action on our repo, and a big one indeed ! Corrected minor namespace issue on schema (imports), let's see what validation yields.

I was wondering what that error was. I made many minor improvement in the transformation and recreated the TEI files. When we get the tests to pass we should be ready to merge into main!

GusRiva commented 2 years ago

The tests are now validating against the schema, right? That is great! Could we also get the file name in the message error so that they are easier to find?

Jean-Baptiste-Camps commented 2 years ago

Done.

GusRiva commented 2 years ago

Ok, it's finally done! It was a good opportunity to go back to some of the first contributions, which still had not been completely updated to the new data model and to fix some bugs in the scripts. @Jean-Baptiste-Camps feel free to merge into main and we can start working on new stemmata with a brand new working pipeline.

GusRiva commented 2 years ago

I just realised, we might get a conflict with the file changed in #101 In that case, keep the changes from main

Jean-Baptiste-Camps commented 2 years ago

Great ! Hurrah us !