PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Frequency of Alternation -- Threshold values?? #97

Closed kchall closed 10 years ago

kchall commented 10 years ago

I'm confused about the implementation of F. of A. in the GUI.

Currently , in order for there to be an "alternation" between two words, the following conditions must hold: a. one of the words contains sound 1 and the other contains sound 2; b. the two words are considered to be similar to each other by being at least X similar; and c. the 2 sounds are phonologically aligned with each other.

Right??

But the GUI simply gives a place to enter minimum and maximum similarity values, which doesn't really make any sense. If we're using Khorsi, it should say something like "Minimum similarity value for words to be considered related" and allow just one entry (X in the description above). If we're using edit distance (of either sort), it would be something like "Maximum distance value for words to be considered related" and again allow just one entry. That is, it doesn't make sense to say that words aren't related if they are MORE similar or LESS distant than some value -- right? Also, given that one measure is distance and one is similarity, it's confusing that the boxes don't distinguish these. If we have just one box, it should say something like "Minimum similarity or maximum distance for words to be considered related." @mmcauliffe @jsmackie

Also, there must be some default value you're using if nothing is entered, right? Because we can't actually calculate this without having a threshold. So we need to be clear as to what that value is for each measure, both in the dialogue box (e.g., either by filling it in as a default or in the tool tips) and in the manual. @mdfry

Finally, what about the phonological alignment? Surely there must be some threshold for saying whether two words have the sounds aligned vs. not aligned -- is this something that users can set? At the very least, we need to be explicit in the manual about how this is being calculated and what threshold is being used. @bhallen

mdfry commented 10 years ago

True, having written up the manual it's much clearer in my head - I've not currently set a default value of relatedness, but it should be there, I'll update this with arbitrary key arguments for now

bhallen commented 10 years ago

It's been a while since we've talked about this, but our alignment criterion for phonological alignment is ad hoc and relatively English-specific. It's not as simple as a threshold distance value. Basically when two words are compared, the algorithm checks to see whether they share some "core" that is identical except for alternations of the target pair of segments, ignoring any material peripheral to this core. Does this kind of check still serve our needs, or should we scrap or modify it?

kchall commented 10 years ago

Hmm. If it's English-based, then perhaps we should set it as an option? e.g., "Try to align words phonologically?" (yes / no) and either do the calculation with or without it, explaining in the ToolTips and the manual that it's primarily useful for English data?

bhallen commented 10 years ago

Yes, I think that would be a good approach. I just spent some time trying to add this feature into the GUI, but I only ended up creating errors---could someone with more knowledge about what's going on in frequency of alternation add this? @mdfry @jsmackie

mdfry commented 10 years ago

I'm unfamiliar with the GUI, but the command line version already has an option to choose whether to phonologically align or not.

Also, even though it was designed with English in mind - would the alignment of phonemes/features work in a similar fashion as foreign languages are likely to be represented in a similar way?

kchall commented 10 years ago

Thanks Scott! This all looks good except for one thing: In SS, when the dialogue box is first opened, "phonological edit distance" is selected, and frequency type isn't greyed out; it defaults to "type." Once a different algorithm is selected and then you go back to "phonological edit distance," the frequency selector is greyed out. Maybe it's best just to make Khorsi the default? @jsmackie

kchall commented 10 years ago

In the latest commit, I still have Phonological edit distance as the default metric on start-up, so I still have the same issue as above. Also, I don't see the option for phonologically aligning or not???

kchall commented 10 years ago

Also, the threshold values should read "minimum similarity (Khorsi only)" and "maximum distance (edit distance only)." The second one is currently correct, but the first one needs to say "minimum similarity" not "minimum distance."