gkunter / coquery

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.
GNU General Public License v3.0
18 stars 4 forks source link

Ambiguity between POS and lemma queries #250

Open gkunter opened 7 years ago

gkunter commented 7 years ago

Sometimes, a query string can be ambiguous as to whether a POS query item or a lemma query item is specified. For example, [*S|*Z] in Buckeye can either refer to all "lemmas" that end in \ or \ (quotation marks because in Buckeye, the Lemma query type represents the canonical pronunciation) or to POS tags ending in \ or \.

This is due to the shorthand notation of (old-style) COCA syntax, where [n*] always refers to the POS tags, and never to lemmas, starting with \. In the new-style COCA syntax, this would be _n*, which resolves the ambiguity.

Coquery offers a way to disambiguate this: [*S|*Z].[*] refers to lemmas ending in \ or \ regardless of their POS tag. However, this may not be obvious to the users.

Probably, the new-style syntax should be adopted by Coquery.