PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

[BUG] PhonoSearch frequency filter issue #774

Closed stannam closed 2 years ago

stannam commented 3 years ago

Describe the bug In Phonological Search, a search with 'Additional filters' does not work as expected.

For example, a 'filtered' search, like searching for an /m/-starting word with 10+ token frequency, raises following errors.

  1. PCT shows nothing on the summary result window if a filtered search does not have a result. See here for expected behaviour.

  2. PCT crashes if a filtered search and an unfiltered search, in either order, are done consecutively through 'Reopen function dialog' and then 'Add to existing...'

Sample corpus file example corpus

To Reproduce For 1 (empty summary window): Start any search with minimum word frequency of 1000, or any huge number. image

For 2 (PCT crashes). First search for word-initial /m/ with 3 or more token frequency, as shown below . image

And then on the result window, click 'Reopen function dialog' to go back to the parameter settings.

Now with the same environments, just empty the freqnecy filter. image

Expected behavior For 1: On the summary result window, there should be a row with 0 type frequency and 0 token frequency. For 2: Since the result of the new search is a superset of exising search results, additional rows should be added. Also, PCT should not crash.

stannam commented 3 years ago

Note to myself: Sub-issue 1. Currently, PCT checks whether a search has any results before applying the filters. This checking should be done after the filters.

Sub-issue 2. The issue is from difference in types for frequency filters. In the function dialog window, the user can only enter a filter as 'str.' The filter value is internally converted into 'float' because it should be used in inequations. Forcibly converting 'str; into 'float' works only when the user purely enters numbers; if the user enters nothing, or enter words or any non-numbers (e.g., '123abc'), PCT fails to convert it into 'float'. Then the user input becomes 'N/A' and the type remains as 'str'.

Problem --> Therefore, the frequency filters can have different types! And this can raise an error when searches are repeated. For example, the first search can have 'float' type filters while the new search has 'str' type. (This was the case in the bug report.) Different types are not good because they cannot be sorted -- and PCT should be able to sort them for the result window!

stannam commented 3 years ago

.... Also, something should be done with regard to the strings in the frequency slots. For example if the user (accidentally) enters '123a' in a frequency filter,

image

One of the following measures should be done: (i) PCT should ignore 'a' and internally treat the input as '123', or (ii) raise an error message and let the user fix the problem, or ☑(iii) only accept numeric inputs in the first place (using 'validator')

stannam commented 2 years ago

close?

kchall commented 2 years ago

All looks good to me, thanks!

stannam commented 2 years ago

Just noticed that the validator does not prevent 'e' in numeric slots.

Not a bug per se because e is a mathematical constant (e.g., minimum frequency of 'e' returns all words with a frequency higher than 2.72.). But interesting.