gkunter / coquery

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.
GNU General Public License v3.0
18 stars 4 forks source link

Column numbering not correct in N-gram tables after _NULL #292

Open gkunter opened 6 years ago

gkunter commented 6 years ago

Test case: Corpus with N-gram lookup table, query string A* _NULL B*, output feature Word.

The generated output string tries to join the Lexicon on WordId3 instead of WordId2 for the last query item. The problem resides in get_feature_joins(), which may need to be made aware of skipped query items.

Generated output string:

SELECT COQ_WORD_1.Word AS coq_word_label_1,
       NULL AS coq_word_label_2,
       COQ_WORD_3.Word AS coq_word_label_3,
       ID1 AS coquery_invisible_corpus_id,
       FileId1 AS coquery_invisible_origin_id
FROM CorpusNgram
INNER JOIN Lexicon AS COQ_WORD_1 ON COQ_WORD_1.WordId = WordId1
INNER JOIN Lexicon AS COQ_WORD_3 ON COQ_WORD_3.WordId = WordId3
WHERE (COQ_WORD_1.Word LIKE 'a%')
  AND (COQ_WORD_3.Word LIKE 'b%')

Expected output string:

SELECT COQ_WORD_1.Word AS coq_word_label_1,
       NULL AS coq_word_label_2,
       COQ_WORD_3.Word AS coq_word_label_3,
       ID1 AS coquery_invisible_corpus_id,
       FileId1 AS coquery_invisible_origin_id
FROM CorpusNgram
INNER JOIN Lexicon AS COQ_WORD_1 ON COQ_WORD_1.WordId = WordId1
INNER JOIN Lexicon AS COQ_WORD_3 ON COQ_WORD_3.WordId = WordId2
WHERE (COQ_WORD_1.Word LIKE 'a%')
  AND (COQ_WORD_3.Word LIKE 'b%')