PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Improve phonological search function #399

Closed kchall closed 9 years ago

kchall commented 9 years ago

Based on a request from Doug: Can we improve the phonological search capability to allow for longer search strings? e.g. to allow for multiple adjacent segments to the left and right of the key segment?

e.g.: allow a search for #[+voc] [m] [+voc]# (where you're searching for [m] in the #[+voc] __ [+voc]# environment)

Maybe this could be implemented by iteratively asking users if they want to add an additional segment to the left-hand / right-hand environment?

kchall commented 9 years ago

Other possible improvements (AO):

"There are a number of places where it would be useful to be able to limit searches more than seems possible - like being able to exclude strictly adjacent segments when you're trying to search for non-local stuff (I've been doing it by calculating the counts on a consonant tier and then subtracting the counts on the transcription tier, which works but is time-consuming), or being able to limit searches to, for instance, the first two vowels in a word (Lezgian has been described as having vowel harmony that applies only within the first foot)."

bhallen commented 9 years ago

I recognize that the search function we use now makes good use of our left/right-side environment attributes, but given the types of searches that have been requested, I wonder if we shouldn't try for something more general.

Here's my proposal. Like Kathleen mentioned, I think it would be a good idea to let users create their search queries "iteratively". Suppose the user has just opened the phonological search window. They're presented with the segment inventory for selection from and a way of using features (just as we have now) to specify segments. But this window would also need to include the following additional options:

And then a final button, maybe bigger and/or bolded and/or made the default:

PCT would keep presenting these windows until the user clicks "End search query input". I think that with this input setup, users should be able to specify any phonological search that they want. (If anyone can think of counterexamples, please let us know!)

This series of inputs would be used to construct regular expressions that are evaluated against the words in the lexicon, a quick and simple way of detecting matches.

Does this sound like a reasonable approach?

kchall commented 9 years ago

@mmcauliffe I basically really like the new environment selection box, which allows iterative selection. It would be really helpful, though, if users could "copy" or "repeat" an environment. E.g. in the phonological search box, if you want to search for all instances of [x], [y], and [z] in an environment, you currently have to select the environment by hand three separate times, putting each of x, y, and z in as the target -- much easier if you could copy the first instance and change one aspect. I imagine this would be useful in a number of other ways, too.

mmcauliffe commented 9 years ago

This is lower priority right now, since multiple segments can be specified for the center of the environment, right?

I think I'll close this for this release, and we can try to get duplicating it for next release with a different issue.