PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

String Similarity -- Phonological Edit Distance #121

Closed kchall closed 10 years ago

kchall commented 10 years ago

Trying to calculate phonological edit distance (IPHOD corpus) yields the following error:

xception in Tkinter callback Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/tkinter/init.py", line 1475, in call return self.func(*args) File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/ssgui.py", line 245, in calculate_string_similarity results = [SS.string_similarity_single_pair('', relator_type, string_type, word1, word2, self.corpus)] File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/string_similarity.py", line 101, in string_similarity_single_pair score = relator.phono_edit_distance(w1, w2, string_type) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_edit_distance.py", line 64, in phono_edit_distance m = a.make_similarity_matrix(w1, w2) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align_ex.py", line 57, in make_similarity_matrix d[x][0]['f'] = d[x-1][0]['f'] + self.compare_segments(seq1[x-1], 'empty') File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align_ex.py", line 113, in compare_segments fs1 = self.features[segment1symbol] KeyError: 'A'

mdfry commented 10 years ago

What word did you search it on? It sounds like phonological edit distance was used with a spelling representation to me? That error should only occur if there was a mistake in a transcription somewhere I think

mdfry commented 10 years ago

The reason that error would happen is because 'A' is not a recognized phoneme (i.e. it has no features), and if there are no features then there's no way to calculate phonological edit distance

kchall commented 10 years ago

It was a search on "Abbot" / "carrot." I entered the words using spelling, but I did specify that transcription should be compared. I thought all three algorithms allowed entry by spelling? (They should!)

bhallen commented 10 years ago

This is happening to me too.

It could be that the algorithm is getting spelling instead of transcription as you say, but one other possibility is that this is something related to diphthongs. E.g., when it should be checking the features for 'AY', it's trying to get features for the nonexistent 'A' instead.

By the way, what is the expected behavior when someone tries to use phonological edit distance with 'compare spelling' selected? It seems very strange that that is possible...

kchall commented 10 years ago

Yes, it should be greyed out as an option!

mdfry commented 10 years ago

Looking into the code, the error occurred because carrot is not in IPHOD - this causes a bypass in the code of looking up a transcription and works off the assumption that you inputted a transcription already.

This is possible because when I programmed these algorithms you could input transcription strings (which are not look-upable in a corpus), thus if you tried to 'find' the transcription in the corpus and could not, it was assumed you already supplied a transcription.

This has the unhappy consequence then of if you look up a spelling word that does not exist in the corpus, it takes that spelling string and tries to run phonological edit distance on that.

bhallen commented 10 years ago

Hmm, 'carrot' is most definitely in IPHOD. I just found it in the corpus browser.

mdfry commented 10 years ago

Oh...hah, I suppose I only looked through the capitalized Cs...I'll get back to you:)

mdfry commented 10 years ago

Okay, fix it - slight bug basically what I said before but with a slightly different cause