DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Mupltiple references within one <xr> element, or multliple <xr> elements? #220

Open daliboris opened 3 months ago

daliboris commented 3 months ago

There are examples of the <xr> element in TEI Lex-0 Guidlines (see section 7.3.2) where multiple items of the same type are encoded as multiple <xr> elements, for example:

<xr type="synonymy">
     <ref type="entry">bettlägerig</ref>
</xr>
<pc>,</pc>
<xr type="synonymy">
     <ref type="entry">krank</ref>
</xr>

I didn't find any example of multiple children <ref> elements within one parent <xr> element, i.e.

<xr type="synonymy">
   <ref type="entry">bettlägerig</ref>
   <pc>,</pc>
   <ref type="entry">krank</ref>
</xr>

Multiple <xr> elements are allowed now by schema.

As the <xr> element

groups all information related to this reference, including explicit labels such as "Syn.", "Cf.", "See also" etc.,

the following example (from the Old Czech Electronic Dictionary; 'Sr.' = 'Cf.') seems to me to be appropriately encoded:

<xr type="related">
   <lbl rendition="nonparej">Sr.</lbl>
   <ref type="entry">přiběhovati</ref>
   <pc function="formDelimiter">,</pc>
   <ref type="entry">přibiehati</ref>
   <pc function="formDelimiter">,</pc>
   <ref type="entry">přiletěti</ref>
   <pc function="formDelimiter">,</pc>
   <ref type="entry">hnáti</ref>
</xr>

rather than:

<xr type="related">
   <lbl rendition="nonparej">Sr.</lbl>
   <ref type="entry">přiběhovati</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">přibiehati</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">přiletěti</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">hnáti</ref>
</xr>

because all referenced entries should share the same label.

What do you think: should we use one <ref> per one parent <xr> element, or multiple <ref> elements?

ttasovac commented 3 months ago

I'm in favor of one xr per ref because it allows you to distinguish labels, when necessary, or even more importantly add usg labels to individual cross-references etc. For instance, if you have "syn. talk, speak, chew the fat (slang)", you have to have xr around chew the fat so that you can group the usg label with the given ref.

Labels like cf. are less important I think, because we "delegate" the meaning of the label to the xr type anyway. So when you have:

<xr type="related">
   <lbl rendition="nonparej">Sr.</lbl>
   <ref type="entry">přiběhovati</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">přibiehati</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">přiletěti</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">hnáti</ref>
</xr>

I don't think it's a huge problem that the label is only grouped with the first ref, because each ref is of type related, so we will always be able to identify each of these references as "related" or as "synonymy" or whatever they are based on the type. But I do admit that it's not a very elegant model, because the label does indeed point to multiple references.

For the case of consistency, I think we should go for one ref per xr...

But your question raises a general challenge of wrapping multiple elements in groups. At the moment, I don't think you can nest xr within an xr, but we could think about changing that and recommending <xr type="xrGrp"> as the outer container for multiple references... since we don't have an xrGrp... (which reminds me of our discussions regarding wrapping individual forms in a declension or a conjugation...)

So... I think there is a good topic here for discussion in Vienna about having a consistent and general approach to grouping all kinds of elements that come in a group...

laurentromary commented 3 months ago

I am perfectly in line with Toma's argument. A more precise encoding allows a better information management, despite the little resulting glitches.

daliboris commented 3 months ago

1) Nested <xr> elements are allowed with the current version 0..9.3 of the schema.

2) I can give another, more conclusive example where one information is shared by multiple references, i.e. where the bibliografic reference is shared by all followed referenced entries (StčS and HSSJ are abbreviations of different dictionaries):

Text:

Sr. StčS obhaněti, pohaniti, HSSJ s. v. prihaniti

Separate:

<xr type="related">
   <lbl rendition="nonparej" expand="Srovnej" value="compare">Sr.</lbl>
   <bibl type="attestation">
      <title>StčS</title>
   </bibl>
   <ref type="entry">obhaněti</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <ref type="entry">pohaniti</ref>                     
</xr>
<pc>,</pc>
<xr type="related">
   <bibl type="attestation">
      <title>HSSJ</title>
   </bibl>
   <lbl rendition="nonparej" expand="sub voce" value="under the entry">s. v.</lbl>
   <ref type="entry">prihaniti</ref>
</xr>

Grouped:

<xr type="related">
   <lbl rendition="nonparej" expand="Srovnej" value="compare">Sr.</lbl>
   <bibl type="attestation">
      <title>StčS</title>
   </bibl>
   <ref type="entry">obhaněti</ref>
   <pc>,</pc>
   <ref type="entry">pohaniti</ref>
</xr>
<pc>,</pc>
<xr type="related">
   <bibl type="attestation">
      <title>HSSJ</title>
   </bibl>
   <lbl rendition="nonparej" expand="sub voce" value="under the entry">s. v.</lbl>
   <ref type="entry">prihaniti</ref>
</xr>

Nested (but invalid, due to xrGrp value of the @type attribute):

<xr type="xrGrp">
   <lbl rendition="nonparej" expand="Srovnej" value="compare">Sr.</lbl>
   <bibl type="attestation">
      <title>StčS</title>
   </bibl>
   <xr type="related">
      <ref type="entry">obhaněti</ref>
   </xr>
   <pc>,</pc>
   <xr type="related">
      <ref type="entry">pohaniti</ref>                     
   </xr>                     
</xr>
<pc>,</pc>
<xr type="related">
   <bibl type="attestation">
      <title>HSSJ</title>
   </bibl>
   <lbl rendition="nonparej" expand="sub voce" value="under the entry">s. v.</lbl>
   <ref type="entry">prihaniti</ref>
</xr>