PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

[BUG] PCT crashes when phonotactic prob cannot be calculated with filters #777

Closed stannam closed 2 years ago

stannam commented 2 years ago

Describe the bug The phonotactic probability function can make PCT crash if

Sample corpus file example

To Reproduce Steps to reproduce the behavior:

  1. Go to 'analysis > phonotactic probability'
  2. Set 'minimum word frequency' to 300 (or any unreasonable value)
  3. In the 'Query' option, select 'Calculate for all words in the corpus'
  4. See error

Operating system and PCT version

kchall commented 2 years ago

Yep, can confirm this crashes on Mac as well; note that it's with any number higher than some word frequencies (e.g. in the example corpus, a frequency of 10 will still crash it). But it does work if it's set to be the same as the smallest actual token frequency, i.e., 2.

stannam commented 2 years ago
kchall commented 2 years ago

Hmm, can you double check that we didn't break phonotactic probability entirely by doing this? I just tried to re-create the example from the documentation (https://corpustools.readthedocs.io/en/latest/phonotactic_probability.html) but without the frequency filter, and PCT just crashed on me, without calculating anything (or giving me an error message).

stannam commented 2 years ago

You are right. I accidentally broke the function while removing the filter.😥 Thanks for noticing this. Now it is fixed. I confrimed that PP is calculated correctly (the attached excel file). Hand_calculating_PP_with_PP_example.xlsx

On a separate note, I cannot follow the documentation where it demonstrates calculating the probability for non-word “pidger” [pɪdʒɚ] in the example presented there. Currently, we only allow users to use segments already in the inventory when they create a non-word. Since neither [dʒ] nor [ɚ] is in the inventory [pɪdʒɚ] can't be created... I think we should change the documentation.

kchall commented 2 years ago

Thanks, @stannam! Can you double-check that you pushed the fix to GitHub? I'm not seeing any new activity, and it's still crashing on my end...

You're right that the example is a bit odd -- if you look closely, you'll see that the original example uses the IPHOD corpus instead of the example corpus (which is definitely unusual in our documentation!). I thought about changing it, but the example corpus is quite small and makes phonotactic probability examples difficult (which is presumably why we used the IPHOD in the first place)...so at this point, I'm inclined to leave it.

stannam commented 2 years ago

Sorry! It is now on the remote repo.

kchall commented 2 years ago

Thanks! Looks to be working now. :)