PhyloStar / CogDetect

A lightweight library for cognate clustering, converting IPA sequences to sound classes, computing distances between languages
2 stars 1 forks source link

GOP None #28

Open PhyloStar opened 7 years ago

PhyloStar commented 7 years ago

There seems to be a problem with GOP None in that the code does not work when GOP is set to None. The code works and gives reasonable results when the GOP is set to -2.5.

PhyloStar commented 7 years ago

The main issue is that the indel penalty in Needleman-Wunsch has to be negative always and the PMI scores for those character pairs that involve a indel can become positive due to the nature of Needleman-Wunsch's similarity optimization code. Alternatively, as @Anaphory wants to use character dependent indel penalty, it is better to use Levenshtein distance as the alignment function since Levenshtein distance minimizes distance.

PhyloStar commented 7 years ago

The calc_pmi function needs to omit those alignments that have indel in it since they do not participate in the PMI calculation. The reason being that the GOP parameter is independent of the characters. I added a line that was there originally to remove indels from the PMI matrix keys.