Closed kchall closed 5 years ago
The first error seems to be linked to how the wordlist is set up -- if a wordlist is used with pairs of words (mata, nata) then the error is thrown. If a wordlist is set up with just one word per line, the algorithm works, but then output is all zero (see error 2 above).
Also, my interface is different than Roger's...mine is screenshotted below:
Note: the above errors are all based on the Phonological Neighbourhood Density algorithm instead of the string similarity algorithm. The string similarity does work for
Never mind.
Hello @kchall ! (It's Elise :) ) I'm on a call with Danica, and she's been having what looks like the exact same issue. She's using the Buckeye corpus and has a list of non-words (UTF-8, each word on a new line, written as Arpabet surface transcription), and it calculates each as having a neighbourhood density of 0. When she tries running just one with the "Calculate for a word/nonword not in the corpus" option, it crashes.
What ended up fixing this problem for you? Thanks!
@Kedersha Hi Elise! I don't remember off the top of my head, but my guess is that it's something that we have fixed in the master branch but not on the current release yet, because we're still working on that. That said, we do have a beta version here: https://github.com/PhonologicalCorpusTools/CorpusTools/releases/tag/v1.5.0b -- so you could try downloading that and seeing if it works?
Thanks for the quick response, I'll pass that along! Hope you're doing well. :)
If a word list is used: Traceback (most recent call last): File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/ndgui.py", line 74, in run call_back = kwargs['call_back']) File "/Users/KCH/Desktop/CorpusTools/corpustools/neighdens/neighborhood_density.py", line 172, in neighborhood_density if not is_neighbor(w, query): File "/Users/KCH/Desktop/CorpusTools/corpustools/neighdens/neighborhood_density.py", line 21, in _is_phono_edit_distance_neighbor return phono_edit_distance(w, query, sequence_type, specifier) <= max_distance File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_edit_distance.py", line 38, in phono_edit_distance m = a.make_similarity_matrix(w1, w2) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align.py", line 73, in make_similarity_matrix d[0][y]['f'] = d[0][y-1]['f'] + self.compare_segments('empty', seq2[y-1], self.underspec_cost) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align.py", line 120, in compare_segments fs2 = self.features[segment2symbol] File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/classes/lexicon.py", line 985, in getitem return self.matrix[item] KeyError: '\t'
If mata is used as the base word in the example corpus, phonological edit distance is returned as 0, but it should be non-zero...