Clear-Bible / macula-hebrew

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible
Other
32 stars 9 forks source link

Gloss upon gloss #41

Open jonathanrobie opened 2 years ago

jonathanrobie commented 2 years ago

We have several sources of glosses, and they have different advantages and purposes. We need simple attribute names that support the glosses we are using:

Obviously, glosses in other languages may also become a factor.

I don't particularly like attribute names like cherith-english in the following:

<Node xmlns:xi="http://www.w3.org/2001/XInclude" Cat="noun" morphId="130020160092" Unicode="עֲשָׂה־אֵ֖ל" nodeId="1300201600920010" StrongNumberX="6214" Greek="ασαηλ">
  <c cherith-english="Asahel" cherith-chinese="亚撒黑" marble-sense="עֲשָׂהאֵל:003001007:Names of People:Asahel|שָׁלֹשׁ:002001001042:Quantity;002001003009:Frequency:three">
    <m word="1CH 2:16!9" n="130020160092" morph="Np" lang="H" lemma="6214+" after="־" pos="noun" type="proper">עֲשָׂה</m>
    <m word="1CH 2:16!10" n="130020160101" lang="H" after=" " lemma="6214" morph="Np" pos="noun" type="proper">אֵ֖ל</m>
  </c>
</Node>

So we need a naming convention that gives us flexibility while keeping this simple. I don't think we need the attribute name to attribute the source, we can do that in documentation and copyright / license statements.

Any suggestions?

jonathanrobie commented 2 years ago

One traditional answer to this would be to use namespaces, e.g.

<c c:english="Asahel"  c:mandarin="亚撒黑" sil:english="Asahel" />

Should we bite the bullet and use namespaces? So far, we don't do this for anything else, and it does add complexity, e.g. people's path expressions may not match for reasons they do not understand.

jonathanrobie commented 2 years ago

If we don't use namespaces, and have a small number of glosses, we can handle it using carefully chosen names. Less clean, but less confusing for some programmers:

<c english="Asahel"  mandarin="亚撒黑" sil="Asahel" sdbh="....."/>
jonathanrobie commented 2 years ago

Berean glosses could be identified as berean="...". For the GNT, we have been using them as a primary gloss. Should we allow ourselves to say gloss="...."?