buda-base / owl-schema

BDRC Ontology Schema
11 stars 2 forks source link

reconstructed title / name #84

Closed eroux closed 6 years ago

eroux commented 6 years ago

Some work titles or person names are not really recorded in the artefactual sources, but the reconstructed name can be used. Most of the cases I've seen are Sanskrit reconstructed from Tibetan, Chinese or Pali. In this example the * indicated a reconstruction. This also happens in the rKTs data although I'm not sure it's noted.

eroux commented 6 years ago

Note that it is even more interesting for Chinese, where the reconstruction of the Sanskrit is much more ambiguous than with Tibetan. For instance the CBC@ database contains:

  <bdo:Person rdf:about="https://dazangthings.nz/cbc/person/18/">
    <skos:altLabel>*Atikūṭa?</skos:altLabel>
    <skos:altLabel>*Atigupta?</skos:altLabel>
    <skos:altLabel>瞿多</skos:altLabel>
    <skos:altLabel>阿地瞿多</skos:altLabel>
  </bdo:Person>

I think it would be interesting to associate the two (original and reconstructed) in the same blank node, something like

cbc:person/18 a :Person ;
    :personName [
        rdfs:label "阿地瞿多"@sa-Hant ;
        :reconstructedOriginal "Atigupta"@sa-x-iast ;
        :reconstructedOriginal "Atikūṭa"@sa-x-iast
    ] .
xristy commented 6 years ago

Perhaps :restored and :restoredAlt (assuming that the first option is like prefLabel:

cbc:person/18 a :Person ;
:personName [
    rdfs:label "阿地瞿多"@sa-Hant ;
    :restored "Atikūṭa"@sa-x-iast ;
    :restoredAlt "Atigupta"@sa-x-iast
] .

?

How are you recognizing sa-Hant? and are the two forms (瞿多 and 阿地瞿多) the same name or variants and one of (Atigupta and Atikūṭa) go with one or the other of the sa-Hant?

eroux commented 6 years ago

well, for data that predates 1950 (such as the data coming from the Buddhist Canon as is the case here), it's always zh-Hant. Here the Chinese is a phonetic transcription of Sanskrit, but it's very ambiguous and there are several ways to reconstruct the Sanskrit from the Chinese phonetics. In this case it's basically one character per syllable

xristy commented 6 years ago

So apparently:

阿    =    A
地    =    ti
瞿    =    kū / gup
多    =    ṭa / ta

and you're taking the <skos:altLabel>阿地瞿多</skos:altLabel> and counting the characters and seeing that 4 is the same number of syllables as in Atigupta and Atikūṭa and factoring in the "*...?" you then assign sa-Hant as the tag for 阿地瞿多?

A google suggests to me that Atikūṭa would be the preferred restoration and so the Atigupta would be reasonably taken as an alternate.

Does the distinction between restored and restoredAlt make sense to you based on your knowledge of the data?

eroux commented 6 years ago

I think there's a small misunderstanding: what you describe is indeed what I did in my head, but I don't think this could be fully automatized... adding an Alt variant is probably a good idea yes.