Capitains / MyCapytain

Texts API and Textual Resources Utility Library for Python 3
http://mycapytain.readthedocs.org
Mozilla Public License 2.0
8 stars 9 forks source link

Alternative languages for editions do not work (CTS) #201

Closed andredelft closed 4 years ago

andredelft commented 4 years ago

Languages can be defined in the __cts__.xml work files as an xml:lang attribute at the root element <ti:work/>. The versions defined here will inherit this language by default, but for translations and commentaries this can be overwritten in <ti:translation> and <ti:commentary> respectively, but for some reason this does not work for editions.

See for example the following __cts__.xml, which I have set up as a test with the corresponding files in a data folder:

<ti:work xmlns:ti="http://chs.harvard.edu/xmlns/cts"
    groupUrn="urn:cts:latinLit:textgroup"
    urn="urn:cts:latinLit:textgroup.work"
    xml:lang="lang0">
    <ti:title xml:lang="en-Latn">Work Name</ti:title>

    <ti:edition workUrn="urn:cts:latinLit:textgroup.work" urn="urn:cts:latinLit:textgroup.work.ed-lang1" xml:lang="lang1">
        <ti:label xml:lang="en-Latn">Test edition</ti:label>
        <ti:description xml:lang="en-Latn">Description</ti:description>
    </ti:edition>

    <ti:translation workUrn="urn:cts:latinLit:textgroup.work" urn="urn:cts:latinLit:textgroup.work.trans-lang2" xml:lang="lang2">
        <ti:label xml:lang="en-Latn">Test translation</ti:label>
        <ti:description xml:lang="en-Latn">Description</ti:description>
    </ti:translation>

    <ti:commentary workUrn="urn:cts:latinLit:textgroup.work" urn="urn:cts:latinLit:textgroup.work.com-lang3" xml:lang="lang3">
        <ti:label xml:lang="en-Latn">Test commentary</ti:label>
        <ti:description xml:lang="en-Latn">Description</ti:description>
    </ti:commentary>

</ti:work>

Note xml:lang="lang0" defined at the root element and lang1, lang2 and lang3 defined at the edition, translation and commentary elements respectively.

If I now set up a Nemo environment and query a GetCapabilities, this gives the following (abreviated) response:

<GetCapabilities xmlns="http://chs.harvard.edu/xmlns/cts">
  ...
  <reply>
    <TextInventory>
      <textgroup urn="urn:cts:latinLit:textgroup">
        ...
        <work urn="urn:cts:latinLit:textgroup.work" xml:lang="lang0" groupUrn="urn:cts:latinLit:textgroup">
          ...
          <edition urn="urn:cts:latinLit:textgroup.work.ed-lang1" xml:lang="lang0" workUrn="urn:cts:latinLit:textgroup.work">
            ...
          </edition>
          <translation urn="urn:cts:latinLit:textgroup.work.trans-lang2" xml:lang="lang2" workUrn="urn:cts:latinLit:textgroup.work">
            ...
          </translation>
          <commentary urn="urn:cts:latinLit:textgroup.work.com-lang3" xml:lang="lang3" workUrn="urn:cts:latinLit:textgroup.work">
            ...
          </commentary>
        </work>
      </textgroup>
    </TextInventory>
  </reply>
</GetCapabilities>

Note that the edition still has xml:lang="lang0" even though we have redefined this in the __cts__.xml file.

PonteIneptique commented 4 years ago

Hi @andredelft ! Thanks for the issue. Originally, this is done because the CTS protocol does not allow for an edition to have a different language from its parent work. If this is an issue for you, I'd recommend using Translation rather than edition, or maybe use something from capitains-structured metadata.

We are moving to a new set of guidelines built for DTS (which will be easy to transform to) that should fix this :)

andredelft commented 4 years ago

Thanks a lot for your responce, @PonteIneptique !

What do you mean by 'use something from capitains-structured metadata'?

To give you a bit of context for this problem, we are encoding works of Moses Maimonides, of which lots of old editions exist in different languages (Arabic, Hebrew, Slavonic, etc.). Since he was multilingual, multiple editions might originate from him (though this is not always clear). Apart from that, modern translations also exist. Our approach was thus to call the old versions 'editions', and the modern 'translation'. But I understand that that does not fit in the CTS guidelines.

PonteIneptique commented 4 years ago

Damn, I am sorry @andredelft, I missed your message... You can use Structured Metadata where you can define multiple dc:language for example:

<ti:edition ...>
....
            <cpt:structured-metadata>
                <dc:language>Arabic</dc:language>
                <dc:language>Hebrew</dc:language>

            </cpt:structured-metadata>
</ti:edition>