Closed kchall closed 9 years ago
@mmcauliffe Michael, do I recall correctly that you have something like this working on the GUI? Any code that I could borrow for the CL interface?
So I just added a new function for Corpus objects, called phonological_search
which takes a list of segments, an optional list of environment specification strings (like "[+feature]_[-feature,+feature]
" or "a_
", etc), and an optional sequence_type keyword argument for which tier to search on. It returns a list of tuples, where the first element is the word object, and the second element is a list of tuples for each segment found that matches the environments (if they're specified). Let me know if you have any questions about it!
Nice!
Even so, I found a couple of issues/questions:
The first two issues should be fixed with the latest commit.
I'm a little confused by your third point. Feature specifications get
converted into lists of segments, so sets of segments can be created as a
list without needing features anyway. The GUI doesn't do anything special
to the # before passing it to the environment, it should just be some like
#_a
or something. An issue I just though of is that environments are
either features or segments, so it's currently not possible to specify the
environment of word boundary before followed by high vowels other than
making a set of all high vowels.
On Fri, Dec 12, 2014, 1:36 PM Blake Allen notifications@github.com wrote:
Nice!
Even so, I found a couple of issues/questions:
1.
File "/usr/local/lib/python3.4/dist-packages/corpustools-0.15.1-py3.4.egg/corpustools/gui/qt/psgui.py", line 142, in setResults self.results.append([str(w), str(getattr(w,self.tierWidget.displayValue())),', '.join(segs), AttributeError: 'Word' object has no attribute 'Transcription' 2.
The GUI says "features" can be the "basis for search", but that's not implemented yet, right? If I select it, I get an error, and nothing changes. 3.
For purposes of making a command line version, do we have a syntax for specifying the conjunction of a set of segments without using features? Like _(a,b,c) or something like that. Also, would it be okay to pass the word edge symbol # in an environment specification string as-is, or does the GUI already interpret the # button as some word edge indicator before the environments are passed to phonological_search?
— Reply to this email directly or view it on GitHub https://github.com/kchall/CorpusTools/issues/201#issuecomment-66839136.
1/2. Seems to be working now. Thanks.
I'll get started on the CLI soon.
The CLI is complete, and I don't see any other unresolved things here, so I'm marking this issue closed.
Allow users to search for sequences of segments and/or phonological feature sets. Return lists of matches and type / token frequency. Ability to view on screen, save to file, and/or export as a sub-corpus.
@mmcauliffe <- GUI person