Open klosoter opened 2 years ago
A mapping file can be found here
The mapping is done on both the word and morph level since SIL has some useful attributes that are only present on either of them (e.g., contextual glosses egs
and morphology coding t
)
<maculaSilMapping>
<word maculaText="בְּרֵאשִׁ֖ית" maculaId="01001001001" SILText="בְּרֵאשִׁ֖ית" SILId="01001001001" SILGlosses="in.beginning" SILTransliteration="bərēʾšîṯ">
<morph maculaText="בְּ" maculaId="010010010011" SILText="בְּ" SILId="010010010011" SILMorphology="Pp" SILTransliteration="bə"/>
<morph maculaText="רֵאשִׁ֖ית" maculaId="010010010012" SILText="רֵאשִׁ֖ית" SILId="010010010012" SILMorphology="ncfsa" SILTransliteration="rēʾšiyṯ"/>
</word>
<word maculaText="בָּרָ֣א" maculaId="01001001002" SILText="בָּרָ֣א" SILId="01001001002" SILGlosses="he.created" SILTransliteration="bārāʾ">
<morph maculaText="בָּרָ֣א" maculaId="010010010021" SILText="בָּרָ֣א" SILId="010010010021" SILMorphology="vqp3ms" SILTransliteration="bārāʾ"/>
</word>
<word maculaText="אֱלֹהִ֑ים" maculaId="01001001003" SILText="אֱלֹהִ֑ים" SILId="01001001003" SILGlosses="God" SILTransliteration="ʾĕlōhîm">
<morph maculaText="אֱלֹהִ֑ים" maculaId="010010010031" SILText="אֱלֹהִ֑ים" SILId="010010010031" SILMorphology="ncmpa" SILTransliteration="ʾĕlōhiym"/>
</word>
<word maculaText="אֵ֥ת" maculaId="01001001004" SILText="אֵ֥ת" SILId="01001001004" SILGlosses="(et)" SILTransliteration="ʾēṯ">
<morph maculaText="אֵ֥ת" maculaId="010010010041" SILText="אֵ֥ת" SILId="010010010041" SILMorphology="Po" SILTransliteration="ʾēṯ"/>
</word>
However, these mappings are not perfectly one-to-one. Generally, there are three cases:
SILId
for one maculaId
(morph level, 2 cases)Use
for $node in //morph[@SILId => contains(";")]
return $node/..
to find all words which have morphemes containing more than one SILId
(this does not occur at the word level):
<word maculaText="בָּארוּמָ֑ה" maculaId="07009041003" SILText="בָּארוּמָ֑ה" SILId="07009041003" SILGlosses="at.(the).Arumah" SILTransliteration="bāʾrûmâ">
<morph maculaText="בָּ" maculaId="070090410031" SILText="בָּ" SILId="070090410031" SILMorphology="Pp" SILTransliteration="bā"/>
<morph maculaText="ארוּמָ֑ה" maculaId="070090410032" SILText="|ארוּמָ֑ה" SILId="070090410032;070090410033" SILMorphology="Pa|np" SILTransliteration="–|ʾrûmāh"/>
</word>
<word maculaText="הָרֹאֶ֖ה" maculaId="13002052007" SILText="הָרֹאֶ֖ה" SILId="13002052006" SILGlosses="Haroeh" SILTransliteration="hārōʾeh">
<morph maculaText="הָרֹאֶ֖ה" maculaId="130020520071" SILText="הָ|רֹאֶ֖ה" SILId="130020520061;130020520062" SILMorphology="Pa|ncmsa" SILTransliteration="hā|rōʾeh"/>
</word>
maculaId
for one SILId
(word level, 1003 cases)Use
for $node in //word[@maculaId => contains(";")]
return $node
to find all words that have more than one maculaId
for one SILId
:
<word maculaText="עַל|כֵּן֙" maculaId="01002024001;01002024002" SILText="עַל־כֵּן֙" SILId="01002024001" SILGlosses="therefore" SILTransliteration="ʿal-kēn">
<morph maculaText="עַל|כֵּן֙" maculaId="010020240011;010020240021" SILText="עַל־כֵּן֙" SILId="010020240011" SILMorphology="Pd" SILTransliteration="ʿal-kēn"/>
</word>
<word maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="01004022006;01004022007" SILText="תּ֣וּבַל קַ֔יִן" SILId="01004022006" SILGlosses="Tubal-Cain" SILTransliteration="tûḇal qayin">
<morph maculaText="תּ֣וּבַל|קַ֔יִן" maculaId="010040220061;010040220071" SILText="תּ֣וּבַל קַ֔יִן" SILId="010040220061" SILMorphology="np" SILTransliteration="tûḇal qayin"/>
</word>
maculaId
for one SILId
(morph level, 48,453 cases)Use
for $node in //morph[@maculaId => contains(";")]
return $node/..
to find all morphs that have more than one maculaId
for one SILId
:
<word maculaText="לְמִינ֔וֹ" maculaId="01001011013" SILText="לְמִינ֔וֹ" SILId="01001011013" SILGlosses="to.its.kind" SILTransliteration="ləmînô">
<morph maculaText="לְ" maculaId="010010110131" SILText="לְ" SILId="010010110131" SILMorphology="Pp" SILTransliteration="lə"/>
<morph maculaText="מִינ֔|וֹ" maculaId="010010110132;010010110133" SILText="מִינ֔וֹ" SILId="010010110132" SILMorphology="ncmscX3ms" SILTransliteration="mînô"/>
</word>
<word maculaText="זַרְעוֹ" maculaId="01001011015" SILText="זַרְעוֹ־" SILId="01001011015" SILGlosses="its.seed" SILTransliteration="zarʿô-">
<morph maculaText="זַרְע|וֹ" maculaId="010010110151;010010110152" SILText="זַרְעוֹ־" SILId="010010110151" SILMorphology="ncmscX3ms" SILTransliteration="zarʿô-"/>
</word>
These are mostly cases where a suffix is involved (SIL does not split suffixes). Use
for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
return $node/..
to filter these cases.
The remaining 1041 cases are mostly compounds. Use
for $node in //morph[@maculaId => contains(";")]
where not(contains($node/@SILMorphology, "X"))
where not(contains($node/../@maculaId, ";"))
return $node/..
to filter these cases too. The remaining 63 cases involve mostly (implied articles):
<word maculaText="כָּעֵ֣ת" maculaId="01018010005" SILText="כָּעֵ֣ת" SILId="01018010005" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
<morph maculaText="כָּ" maculaId="010180100051" SILText="ךָּ" SILId="010180100051" SILMorphology="Pp" SILTransliteration="kā"/>
<morph maculaText="|עֵ֣ת" maculaId="010180100051ה;010180100052" SILText="עֵ֣ת" SILId="010180100052" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
<word maculaText="כָּעֵ֥ת" maculaId="01018014007" SILText="כָּעֵ֥ת" SILId="01018014007" SILGlosses="about.[the].time" SILTransliteration="kāʿēṯ">
<morph maculaText="כָּ" maculaId="010180140071" SILText="ךָּ" SILId="010180140071" SILMorphology="Pp" SILTransliteration="kā"/>
<morph maculaText="|עֵ֥ת" maculaId="010180140071ה;010180140072" SILText="עֵ֥ת" SILId="010180140072" SILMorphology="ncbsa" SILTransliteration="ʿēṯ"/>
</word>
To be done:
egs
attributes, contextual glosses, at the word level into smaller glosses corresponding to morphemes
This is what the extracted SIL data looks like (full file here)