Jargonautika / Yupik_Forced_Aligner

3 stars 0 forks source link

Discovering Collapsed Verses #1

Closed Jargonautika closed 5 years ago

Jargonautika commented 5 years ago

Sometimes, because of the polysynthetic nature of St. Lawrence Island Yupik, the translation of multiple (English/Greek) New Testament verses have been collapsed into fewer verses. (We should check to make sure that the opposite is not true!).

We need to discover where these discrepancies lie: in every book and in every chapter. The translation is based off of The Webster Bible (WBT) translation, so we can accept those verse markings as the base truth. Looking at the text files, we need to either (a) discover where verses are marked with notation such as 14 - 17 or (b) where #s 15 - 17 are missing. This will be trickiest at the end of the chapter. Some examples are below.

Example 1: Matthew Ch. 04 WBT 25 verses Yupik - 14 and 15 collapsed 14 - Jesus-em tawaten pilleghmikun apeghiiqaa uuknaliqistem Isaiah-m uuknaliqellgha. Qavngarugllak Isaiah-m uuknaliqutkegkaqii yuget kiyaghlleghhiit nunagkeni Zebulun-enkuk Naphtali-nkuk. Taakuk nunak naayvam tunganganilnguuk, nallangani Kiiwem Jordan-em, Galilee-melngughmi Gentile-et kiyaghfigitni! 16 Isaiah-m pikaqegkangi whaten, “Tamaakut kiyaghlleghhiit seghleghqellghem mamleghqestellghani esghaasaghqaagut nighugllagmeng. Alingellghita tuqumeng llangaqa nunangat saapngalghii taghneghangutellghaneng tuqullghem. Iwernga maaten aghtaghaataqii.”

  Verse (15) is not found, even though 14 comprises the same semantic material. We need to be able to discover other such places in the text. 

Example 2: Matthew Ch. 02 WBT 23 verses Yupik - 22 and 23 collapsed (last ones) 22 - 22 Iwernga Joseph-em nagaqughluku nutaghaq umiilga Judea-m, Archelaus-nguniluku Herod-em ighnegha, alingegkaq tawavek aglaneghmeng. Qavangukun Kiyaghneghem ungipaatkaa Joseph Judea-tefqaan pisqelluku, enkaam Nazareth-mun aglaqat Galilee-melngughmun. Tamaantekat uuknaliqistet whaten pillghat, apeghiighesqelluku, “Messiah Nazareth-eghmiinguyaghqaaguq.”

Obviously encoding of this example is problematic, but it should be enough to show the general issue.

ssethia2 commented 5 years ago

I just expanded the verseFinder script to print out the merged verses. I ran the script with increasing upper limits, only printing cases of those limits. At an upper limit of 4, there was only one instance of a jump, from 36 to 40. That, however, was the number 40 mentioned in the text. So I came to the conclusion that at most 3 verses are merged together. For now, I ignored the case when verse numbers are repeated, but those might be expanded verses.

ssethia2 commented 5 years ago

So I finally revisited this. My above assumption of a maximum of 3 verses collapsed was wrong. I went on increasing the upper limit and found that there were some cases of 5 verses merged, but no more. I did not find any verses to be expanded (their verse numbers repeated) other than in the case when it was the first verse number (the regex erroneously picks the chapter number instead of the verse number).

Jargonautika commented 5 years ago

Closing this issue and moving forward with others.