glottobank / tukano

Repository for computer-guided reconstruction with Jena wordlist standard for Tukano language data
GNU General Public License v2.0
1 stars 0 forks source link

Problems regarding segmentation and tokenization of sounds/segments #16

Closed thiagochacon closed 9 years ago

thiagochacon commented 9 years ago

Entry ID 977 in *PT has a missing initial (but in the IPA column it is there. Looks like it got deleted)

Entry ID 107, 108, 109, 111, 113, 115, 1120 in *PT has a split segment "t j?", should have been "tj?" as we previously aligned it.

Entry ID 1127 in *PT has a split segment "k ?", it should have been "k?" as we previously aligned it.

Same as in 76, 1269, 1316, 1338 *PT where there is "t j" and it should be "tj"

Same as in 227, 291 *PT where there is "p ?" and it should be "p?"

Same as in 306, 1077 *PT where there is "t ?" and it should be "t?"

Same as in 1077 *PT where there is "t ?" and it should be "t?"

Same as in 660, 1019 *PT where there is "k k" and it should be "kk"

etc.

thiagochacon commented 9 years ago

The "tokens" visualization are somewhat different from what we see in the COGID alignment. That is why I thought things were out of alignment, or that a complex segment "tj" was being treated as two segments "t j".

Looks like everything is fine in the COGID alignments.