biblicalhumanities / greek-new-testament

Greek New Testament
45 stars 18 forks source link

? missing words #14

Closed eliranwong closed 7 years ago

eliranwong commented 7 years ago

no of words in low-fat: 137776 (by search osisId="[^\r<>"]*?") no of words in Nestle1904.csv: 137779 why the difference?

jonathanrobie commented 7 years ago

I just did two queries to see how many words I have.

count(//w) => 137832

count(//w[@osisId]) => 137832

So my counts do not match yours for the trees. But they do for Nestle.csv:

$ wc -l Nestle1904.csv 
  137779 Nestle1904.csv

I don't know why, but the trees seem to have 4 more words than Nestle.csv. I will leave this bug open until I find and fix it.

jonathanrobie commented 7 years ago

Found several duplicate syntax trees. I think this goes back to an experiment GBI was doing with alternate interpretations. I want to add support for alternate interpretations, but want to do it a different way.

After removing the duplicate trees, the word counts match.