Clear-Bible / macula-hebrew

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible
Other
38 stars 9 forks source link

There are missing `m/@xml:id`s in our current lowfat trees #65

Open ryderwishart opened 2 years ago

ryderwishart commented 2 years ago

I created a Jupyter notebook (here: 4d1ff81b8aa43099932010c96ae730375a709e8c) to test the integrity of our word-level text content by comparing the nodes trees to the lowfat trees because I was running into the fact that there are different numbers of @xml:ids between the two trees.

The current issue seems to pertain to particles only, for example:

<Node Cat="P" Rule="ptcl2P" Head="0" nodeId="0103802100610011">
   <Node n="o010380210061"
         Cat="ptcl"
         morphId="010380210061"
         Unicode="אַיֵּ֧ה"
         nodeId="0103802100610010"
         StrongNumberX="0346"
         Greek="ποῦ"
         GreekStrong="4226">
      <m word="GEN 38:21!6"
         xml:id="o010380210061"
         lang="H"
         after=" "
         lemma="346"
         morph="Ti"
         pos="particle"
         type="interrogative"
         english="where"
         mandarin="哪里"
         Domain="003002004"
         SDBH="000321001001000">אַיֵּ֧ה</m>
   </Node>
</Node>
jonathanrobie commented 2 years ago

These particles are being absorbed into wg elements. You can see them with this query:

//wg[@unicode]

<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיּ" strongnumberx="0335" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵ֖י" strongnumberx="0335" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֖ה" strongnumberx="0346" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֧ה" strongnumberx="0346" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֥ה" strongnumberx="0346" greek="ἐστιν"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="adv" class="ptcl" unicode="מָתַ֛י" strongnumberx="4970" greek="πότε"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="adv" class="ptcl" unicode="אֵיפֹ֖ה" strongnumberx="0375" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֧ה" strongnumberx="0346" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיּ" strongnumberx="0335" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵ֣י" strongnumberx="0335" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֣ה" strongnumberx="0346" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵיפֹה֙" strongnumberx="0375" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אַיֵּ֨ה" strongnumberx="0346" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="adv" class="ptcl" unicode="אֵיפֹ֨ה" strongnumberx="0375" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵי־" strongnumberx="0335" greek="ποῖος"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵיפֹ֥ה" strongnumberx="0375" greek="ποῦ"/>
<wg xmlns:xi="http://www.w3.org/2001/XInclude" role="p" class="ptcl" unicode="אֵֽי־" strongnumberx="0335"/>

etc.