OpenStemmata / database

An open database of stemmata
Creative Commons Attribution Share Alike 4.0 International
13 stars 12 forks source link

TEI and ODD, new version #68

Closed Jean-Baptiste-Camps closed 3 years ago

Jean-Baptiste-Camps commented 3 years ago

Hi again @GusRiva and @gabays , What do you think of the new proposed encoding for the Alexis example, https://github.com/OpenStemmata/database/blob/cd0d525c247525b043e3e7b05a3a700a6fb08bdb/examples/Paris_1872_Alexis/Paris_1872_Alexis.xml ?

We could probably dispense with attributes such as size order inDegree outDegree, because they can be automatically computed. On the other hand, it might be quite convenient to have them in the TEI, since it is typically the kind of information we will count a lot…

Jean-Baptiste-Camps commented 3 years ago

Ok, I've modified the teiHeader slightly, and adapted the ODD (also, partly recode it in an explicit and a bit limitative kind of way).

Jean-Baptiste-Camps commented 3 years ago

You can have a look at the html documentation there: https://openstemmata.github.io/odd.html

We could restrict still a bit more, but let's perhaps see first with more examples !

Jean-Baptiste-Camps commented 3 years ago

By the way, should we switch the bibl typology to book/inpublication, rather than book/article, to accomodate for chapters ?

Jean-Baptiste-Camps commented 3 years ago

PS: don't worry about failing tests right now, we will see after merging the other PR, which contain tests updates…

Jean-Baptiste-Camps commented 3 years ago

I have a weird behaviour with the ODD: some modifications are not taken into account in the transformation, and I can't figure out why. For instance, the attribute modifications on node.

Jean-Baptiste-Camps commented 3 years ago

Perhaps it is related to https://github.com/TEIC/TEI/issues/2128

Jean-Baptiste-Camps commented 3 years ago

Hell, it is the same bug I already have had once ! If the ODD is linked to the schema, it causes this behaviour, and, if I remove

<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>

THEN it works. This makes 0 sense.

Jean-Baptiste-Camps commented 3 years ago

To finish with this, I am also treating the second (Roland) example. Do you think we need to include something to deal with dashed meaning «uncertain» ? Using @cert on the TEI is straightforward, but what on the .gv ? style="dotted' ?

GusRiva commented 3 years ago

To finish with this, I am also treating the second (Roland) example. Do you think we need to include something to deal with dashed meaning «uncertain» ? Using @cert on the TEI is straightforward, but what on the .gv ? style="dotted' ?

Good point. How do we make a distinction between contamination and uncertainty in these cases?

GusRiva commented 3 years ago

By the way, should we switch the bibl typology to book/inpublication, rather than book/article, to accomodate for chapters ?

"inpublication" sounds strange, but I can't come up with any good alternatives. "section", "part"?

GusRiva commented 3 years ago

Hi again @GusRiva and @gabays , What do you think of the new proposed encoding for the Alexis example, https://github.com/OpenStemmata/database/blob/cd0d525c247525b043e3e7b05a3a700a6fb08bdb/examples/Paris_1872_Alexis/Paris_1872_Alexis.xml ?

We could probably dispense with attributes such as size order inDegree outDegree, because they can be automatically computed. On the other hand, it might be quite convenient to have them in the TEI, since it is typically the kind of information we will count a lot…

As we are doing the conversion automatically, I think it could be a good thing to include these attributes. Even if they could be computed, this makes it easier to find the information.

GusRiva commented 3 years ago

I'm wondering about the encoding of the arc. From the Alexis example I understand that we will use label as a way of distinguishing "filiation" from "contamination". Is this correct? This might be a clever way of getting over the problem that @type is not allowed in arc. On the other hand, it might be possible that some stemmata have actual labels on the edges. We can maybe then add another label - this is allowed by the guidelines. Should we maybe add a @type to the label?

Jean-Baptiste-Camps commented 3 years ago

To finish with this, I am also treating the second (Roland) example. Do you think we need to include something to deal with dashed meaning «uncertain» ? Using @cert on the TEI is straightforward, but what on the .gv ? style="dotted' ?

Good point. How do we make a distinction between contamination and uncertainty in these cases?

In the TEI, that could be very explicit, with @cert on one hand, and the label on the other. On the graphviz, it is a bit more complicated: we would have style="dashed' in one case, and style="dotted" in the other, but that can be quite confusing. I think we can find something better.

Jean-Baptiste-Camps commented 3 years ago

By the way, should we switch the bibl typology to book/inpublication, rather than book/article, to accomodate for chapters ?

"inpublication" sounds strange, but I can't come up with any good alternatives. "section", "part"?

TEI has analytical, but that is even more cryptic. incollection ?

Jean-Baptiste-Camps commented 3 years ago

I'm wondering about the encoding of the arc. From the Alexis example I understand that we will use label as a way of distinguishing "filiation" from "contamination". Is this correct?

Precisely, yes ! I am wondering why there are no types on arc, though, this is weird. As an alternative, we could still use an @type in our model, by creating one in our own namespace (I'm always surprised to see that @type is not available everywhere xml:id is.

This might be a clever way of getting over the problem that @type is not allowed in arc. On the other hand, it might be possible that some stemmata have actual labels on the edges. We can maybe then add another label - this is allowed by the guidelines. Should we maybe add a @type to the label?

We could have indeed have a general purpose label type="generic" and label type="specific" for those, if we encounter them.

GusRiva commented 3 years ago

Precisely, yes ! I am wondering why there are no types on arc, though, this is weird. As an alternative, we could still use an @type in our model, by creating one in our own namespace (I'm always surprised to see that @type is not available everywhere xml:id is.

I feel tempted to add the @type attribute in our model, because it seems more appropriate than label. I would then use the element label for actual fragments of text that are attached to the lines in the images.

GusRiva commented 3 years ago

By the way, should we switch the bibl typology to book/inpublication, rather than book/article, to accomodate for chapters ?

"inpublication" sounds strange, but I can't come up with any good alternatives. "section", "part"?

TEI has analytical, but that is even more cryptic. incollection ?

publicationPart?

Jean-Baptiste-Camps commented 3 years ago

Both modifications made. Perhaps, once we are content with our model, we can inquire why there is no @type on arcs.