PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

[BUG] DivideByZeroError in MutualInfo #778

Closed ecoates-bc closed 2 years ago

ecoates-bc commented 2 years ago

When running Mutual Information on a corpus with pronunciation variants, there is an error when the algorithm is run on a bigram that is found in the corpus, but not in the selected pronunciation variants "corpus context." Here is the error traceback:

Traceback (most recent call last): File "/home/edith/PCT/corpustools/gui/migui.py", line 40, in run res = pointwise_mi(c, pair, File "/home/edith/PCT/corpustools/mutualinfo/mutual_information.py", line 166, in pointwise_mi unigram_dict = corpus_context.get_frequency_base(gramsize = 1, halve_edges = halve_edges, File "/home/edith/PCT/corpustools/contextmanagers.py", line 121, in get_frequency_base return_dict = { k:v/freq_base['total'] for k,v in return_dict.items()} File "/home/edith/PCT/corpustools/contextmanagers.py", line 121, in return_dict = { k:v/freq_base['total'] for k,v in return_dict.items()} ZeroDivisionError: division by zero

This was testing the bigram (ʃ, ɑ) on the attached corpus.

modified_ilg_sample.txt

ecoates-bc commented 2 years ago
stannam commented 2 years ago

😥 Strangely, I can't import the file. I'll talk to you during the next meeting.

kchall commented 2 years ago

The above should now be fixed because PCT gives a warning that the bigram isn't found in the corpus. But, this raises a different issue, where if the bigram is only in the pronunciation variants, there's no way to select the segments because they don't occur in the inventory. Opening a new issue for that.