PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
110 stars 16 forks source link

Add in search capability for phonological sequences #201

Closed kchall closed 9 years ago

kchall commented 9 years ago

Allow users to search for sequences of segments and/or phonological feature sets. Return lists of matches and type / token frequency. Ability to view on screen, save to file, and/or export as a sub-corpus.

@mmcauliffe <- GUI person

bhallen commented 9 years ago

@mmcauliffe Michael, do I recall correctly that you have something like this working on the GUI? Any code that I could borrow for the CL interface?

mmcauliffe commented 9 years ago

So I just added a new function for Corpus objects, called phonological_search which takes a list of segments, an optional list of environment specification strings (like "[+feature]_[-feature,+feature]" or "a_", etc), and an optional sequence_type keyword argument for which tier to search on. It returns a list of tuples, where the first element is the word object, and the second element is a list of tuples for each segment found that matches the environments (if they're specified). Let me know if you have any questions about it!

bhallen commented 9 years ago

Nice!

Even so, I found a couple of issues/questions:

  1. File "/usr/local/lib/python3.4/dist-packages/corpustools-0.15.1-py3.4.egg/corpustools/gui/qt/psgui.py", line 142, in setResults self.results.append([str(w), str(getattr(w,self.tierWidget.displayValue())),', '.join(segs), AttributeError: 'Word' object has no attribute 'Transcription'
  2. The GUI says "features" can be the "basis for search", but that's not implemented yet, right? If I select it, I get an error, and nothing changes.
  3. For purposes of making a command line version, do we have a syntax for specifying the conjunction of a set of segments without using features? Like _(a,b,c) or something like that. Also, would it be okay to pass the word edge symbol # in an environment specification string as-is, or does the GUI already interpret the # button as some word edge indicator before the environments are passed to phonological_search?
mmcauliffe commented 9 years ago

The first two issues should be fixed with the latest commit.

I'm a little confused by your third point. Feature specifications get converted into lists of segments, so sets of segments can be created as a list without needing features anyway. The GUI doesn't do anything special to the # before passing it to the environment, it should just be some like #_a or something. An issue I just though of is that environments are either features or segments, so it's currently not possible to specify the environment of word boundary before followed by high vowels other than making a set of all high vowels.

On Fri, Dec 12, 2014, 1:36 PM Blake Allen notifications@github.com wrote:

Nice!

Even so, I found a couple of issues/questions:

1.

File "/usr/local/lib/python3.4/dist-packages/corpustools-0.15.1-py3.4.egg/corpustools/gui/qt/psgui.py", line 142, in setResults self.results.append([str(w), str(getattr(w,self.tierWidget.displayValue())),', '.join(segs), AttributeError: 'Word' object has no attribute 'Transcription' 2.

The GUI says "features" can be the "basis for search", but that's not implemented yet, right? If I select it, I get an error, and nothing changes. 3.

For purposes of making a command line version, do we have a syntax for specifying the conjunction of a set of segments without using features? Like _(a,b,c) or something like that. Also, would it be okay to pass the word edge symbol # in an environment specification string as-is, or does the GUI already interpret the # button as some word edge indicator before the environments are passed to phonological_search?

— Reply to this email directly or view it on GitHub https://github.com/kchall/CorpusTools/issues/201#issuecomment-66839136.

bhallen commented 9 years ago

1/2. Seems to be working now. Thanks.

  1. So # works as expected, but there's no way to specify non-featural segment sets. Seems fine to me for now.

I'll get started on the CLI soon.

bhallen commented 9 years ago

The CLI is complete, and I don't see any other unresolved things here, so I'm marking this issue closed.