biblicalhumanities / greek-new-testament

Greek New Testament
45 stars 18 forks source link

Lowfat counts don't match SBLGNT counts #10

Closed jonathanrobie closed 7 years ago

jonathanrobie commented 8 years ago

The following query returns different results for the SBLGNT base text and the Lowfat trees:

for $w in //w
let $text := string($w)
group by $text
let $count := count($w)
where $count > 1000
order by $count descending
return <word count="{$count}">{ $text }</word>

This may indicate that the underlying texts are not identical (perhaps the SBLGNT text has changed since the original morphology? Perhaps there was a mistake somewhere along the line? Perhaps it's a Unicode normalization issue? It should probably be investigated ...)

Here are the results for the SBLGNT base text:

<?xml version="1.0" encoding="UTF-8"?>
 <word count="8563">καὶ</word>
 <word count="2798">ὁ</word>
 <word count="2680">ἐν</word>
 <word count="2597">δὲ</word>
 <word count="2498">τοῦ</word>
 <word count="1744">εἰς</word>
 <word count="1655">τὸ</word>
 <word count="1560">τὸν</word>
 <word count="1508">τὴν</word>
 <word count="1413">αὐτοῦ</word>
 <word count="1298">τῆς</word>
 <word count="1283">ὅτι</word>
 <word count="1225">τῷ</word>
 <word count="1203">τῶν</word>
 <word count="1076">οἱ</word

Here are the results for the SBLGNT Lowfat trees:

<?xml version="1.0" encoding="UTF-8"?>
<word count="8577">καὶ</word>
<word count="2803">ὁ</word>
<word count="2684">ἐν</word>
<word count="2609">δὲ</word>
<word count="2500">τοῦ</word>
<word count="1749">εἰς</word>
<word count="1657">τὸ</word>
<word count="1562">τὸν</word>
<word count="1514">τὴν</word>
<word count="1417">αὐτοῦ</word>
<word count="1299">τῆς</word>
<word count="1284">ὅτι</word>
<word count="1227">τῷ</word>
<word count="1208">τῶν</word>
<word count="1080">οἱ</word>
jonathanrobie commented 7 years ago

Fixed by eliminating alternate interpretations in the main branch.