LinguList / dogon-data

Dogon languages, complete re-write of the old attempts.
GNU General Public License v3.0
2 stars 0 forks source link

Sound Correspondence Pattern Extraction #26

Open IndianaTones opened 6 years ago

IndianaTones commented 6 years ago

the compare script is also excellent @LinguList and I do see that my judgments were too loose. I can put this file into Edictor and see side by side the cognancy judgments - I think this should really be an integral part of the workflow. Dogon-Dataset_lexstat.txt Dogon-Dataset_SCA.txt Yet, my question now is, and I know you are working on it, but some of the cognates automatically detected seem crazy to me, and yet the final result (as viewed in Splitstree) remains about the same. So, before I go and change my own judgments, I would like to know what signal the algorithm is seeing? For instance, how does it detect a match for 'BIRD' [nǐ:m] in Bankan_Tey and [kɔ̀nɔ́] in Bambara, AutoCogId 283?! Not only are these supposedly unrelated languages, I can't see how these two forms correlate. Is there a way of teasing apart the script to see what it is doing in the background?

LinguList commented 6 years ago

Which algo detects the BIRD thing? If it's lexstat, this would surprise me... I'd just say say they aren't cognate.

IndianaTones commented 6 years ago

No, I agree, they can't be cognate, but why does the alogarithm think they are? There are many equally crazy (to me) examples like this - surely it detects things I cannot see....

LinguList commented 6 years ago

It's important to know which method produces this strange output. LexStat has 99 % with your analysis in correct positives, this means it is highly unlikely that it fails on cases like you mention there. But also: which code and which threshold did you use?

IndianaTones commented 6 years ago

Yes, I re-did everything in LexStat, it's very strict but no weird issues.