PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Add pair of sounds based on featural descriptions #357

Closed kchall closed 9 years ago

kchall commented 9 years ago

In Functional Load, ProD, MI, KL -- allow users to select a pair of sounds based on featural combinations instead of just through the inventory.

kchall commented 9 years ago

this is one that I think phonologists will definitely want

mmcauliffe commented 9 years ago

Ok, so working through this conceptually, for say functional load:

A user selects '+voc,+hi,+tense' and '+voc,+hi,-tense' as the "segment" pairs. That functions the exact same way as selecting pairs ("u","ʊ") and ("i","ɪ") and selecting "all segment pairs together", correct?

If they further add '+voc,-hi,-low,+tense' and '+voc,-hi,-low,-tense' as additional segment pairs, what should the behaviour be for the options in multiple segment pairs together? If they select "all segment pairs together", treat everything as two big sets, and if they select "Each segment pair individually", it just uses the two sets, correct? So then it'd be mainly a wording change for that?

Also, I'm not seeing a way to add pairs of feature specifications that doesn't duplicate the feature selection widget. So it'd have to be like the environment selection window, which I feel like is really clunky, or it'd have to be custom. Well, maybe it'd be worth re-envisioning how features are interacted with in general.

kchall commented 9 years ago

Ah, well, really the only reason we have the "all segment pairs together" was to create a kind of hack for doing a featurally-based search. So we might decide that we don't want to keep that option at all. I suppose the loss would be if someone wanted to calculate the functional load of a pair of non-natural classes of segments, though I don't really know that I'd want to support that anyway!

So, I think the "simple" answer would be that all pairs would now be treated as individual pairs:

If you do ("u","ʊ") and ("i","ɪ"), you get the functional load of ("u","ʊ") and the functional load of ("i","ɪ") separately. No option to group them together.

If you do ('+voc,+hi,+tense', '+voc,+hi,-tense'), you get the functional load of the tense/lax distinction between high vowels, exactly as you say above.

If you do ('+voc,+hi,+tense', '+voc,+hi,-tense') and then also ('+voc,-hi,-low,+tense', '+voc,-hi,-low,-tense'), you get the functional load of the tense/lax distinction between high vowels, and then separately get the functional load of the tense / lax distinction between mid vowels.

If you wanted the functional load of the tense / lax distinction among all vowels, you'd do ('+voc,+tense', '+voc,-tense').

As for the interface: yeah, I was assuming that this would be analogous to the environment selection window (or the "Basis for search" window in the phonological search box), where you can choose segments or features as the basis for building your pairs.

The feature selection is a bit clunky, for sure, but I don't have any good ideas for improving it...alphabetical order does seem decent for features in a way that it is less intuitive for segments (ironically, I suppose!), and hopefully people will be familiar enough with their own feature systems to know what to look for. I mean, I can imagine wanting the features organized by type (place features vs. manner features, etc.), but that gets into the problem of interpreting other peoples' potentially wacky feature systems. Probably not something to focus on right now, at any rate!

mmcauliffe commented 9 years ago

Yeah, I was thinking more like a text field where you could type a feature specification with autocomplete (so typing '+son' would bring up "+sonorant" that you could tab or scroll down and press enter to complete), and hitting backspace following entry of a feature would remove the whole string (like in google's compose window, when you're entering email addresses to recipients in a compose window). That way you just have a single line for the whole feature widget, which frees up a lot of screen real estate.

We should also have as part of the feature widget what segments are specified by whatever is currently entered. So entering in "+voc" (in any method) should list all the vowels, further entering "+hi" should narrow that to just the high vowels.

Anyway, probably needs a bit more thought into that aspect and a mock up to really evaluate. But for functional load, I think it makes sense taking away that widget for multiple segment pairs. We were going to implement that into prod and the rest too, at some point, but that can get replaced with this new method. It'd still be new functionality for the other algorithms though, unfortunately. Do you think you could come up with a expected values for calculating prod for sets of segment pairs in the example corpus?

kchall commented 9 years ago

Ah, I see...clever. Though I don't know that people will totally know their featural specifications well enough in all cases to type them from scratch? I'm just thinking that for me, I'm working with enough corpora with slightly different specifications that it's kind of a pain to keep track: are vowels [+syllabic], [+vocalic], or [-consonantal] in the system I'm using right now? Obviously, I can try all three until I get an auto-complete, but especially if we're hoping to use this with students, too, that might not be the best solution....

I do think the preview option would be useful -- we already do that with the "Add tier" function, and it's super.

Yes, I can work on some expected values for PROD.

kchall commented 9 years ago

I've given two examples of expected ProD calculations for the example corpus based on featural selection instead of segmental selection in the "ProD_Feature_Tests.xlsx" file in the Dropbox folder. I'll also e-mail the file to @jsmackie and @mmcauliffe.

kchall commented 9 years ago

@mmcauliffe Where are we with this? The way that featural selection is working, it is actually just being used to select pairs of segments, which is not actually what issue #316 was about (which we thought would get subsumed here), or even the issue with featural functional load. Specifically: let's say I want to calculate the ProD or the FL of tense vs. lax vowels in a corpus. If I go in and select [+syllabic, +tense] in the features, and then hit ok, I get all the pairs of tense vowels, but no way to compare those to the lax vowels.

I think this is ok as far as FL goes, because we still have the "All segment pairs together" option. But it's not actually an answer to #316 after all. We need a version of the "All segment pairs together" option for ProD.

mmcauliffe commented 9 years ago

I'm gonna close this as working now, since there are other issues open addressing the specifics of it