PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Phonological neighbourhood density #699

Closed kchall closed 5 years ago

kchall commented 5 years ago
  1. If a word list is used: Traceback (most recent call last): File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/ndgui.py", line 74, in run call_back = kwargs['call_back']) File "/Users/KCH/Desktop/CorpusTools/corpustools/neighdens/neighborhood_density.py", line 172, in neighborhood_density if not is_neighbor(w, query): File "/Users/KCH/Desktop/CorpusTools/corpustools/neighdens/neighborhood_density.py", line 21, in _is_phono_edit_distance_neighbor return phono_edit_distance(w, query, sequence_type, specifier) <= max_distance File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_edit_distance.py", line 38, in phono_edit_distance m = a.make_similarity_matrix(w1, w2) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align.py", line 73, in make_similarity_matrix d[0][y]['f'] = d[0][y-1]['f'] + self.compare_segments('empty', seq2[y-1], self.underspec_cost) File "/Users/KCH/Desktop/CorpusTools/corpustools/symbolsim/phono_align.py", line 120, in compare_segments fs2 = self.features[segment2symbol] File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/classes/lexicon.py", line 985, in getitem return self.matrix[item] KeyError: '\t'

  2. If mata is used as the base word in the example corpus, phonological edit distance is returned as 0, but it should be non-zero...

kchall commented 5 years ago

The first error seems to be linked to how the wordlist is set up -- if a wordlist is used with pairs of words (mata, nata) then the error is thrown. If a wordlist is set up with just one word per line, the algorithm works, but then output is all zero (see error 2 above).

kchall commented 5 years ago

Also, my interface is different than Roger's...mine is screenshotted below:

image

kchall commented 5 years ago

Note: the above errors are all based on the Phonological Neighbourhood Density algorithm instead of the string similarity algorithm. The string similarity does work for on its own, but the pairs list crashes PCT entirely.

kchall commented 5 years ago

Never mind.

Kedersha commented 3 years ago

Hello @kchall ! (It's Elise :) ) I'm on a call with Danica, and she's been having what looks like the exact same issue. She's using the Buckeye corpus and has a list of non-words (UTF-8, each word on a new line, written as Arpabet surface transcription), and it calculates each as having a neighbourhood density of 0. When she tries running just one with the "Calculate for a word/nonword not in the corpus" option, it crashes.

What ended up fixing this problem for you? Thanks!

kchall commented 3 years ago

@Kedersha Hi Elise! I don't remember off the top of my head, but my guess is that it's something that we have fixed in the master branch but not on the current release yet, because we're still working on that. That said, we do have a beta version here: https://github.com/PhonologicalCorpusTools/CorpusTools/releases/tag/v1.5.0b -- so you could try downloading that and seeing if it works?

Kedersha commented 3 years ago

Thanks for the quick response, I'll pass that along! Hope you're doing well. :)