gkunter / coquery

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.
GNU General Public License v3.0
18 stars 4 forks source link

Retrieving sentence contexts for quantified queries may fail #200

Closed gkunter closed 7 years ago

gkunter commented 7 years ago

If a quantified query returns the same sentence several times, retrieving contexts that are limited to a sentence may fail.

Test case: BNC, Left context: 50, right context: 0, Single string, Sentence restriction. Query string: [keep] up *{1,7} ..

This is so because the SQL query in SQLResource.get_sentence_ids() returns only one sentence Id per token Id, even if the same token is passed more than once. As a result, id_list and df have different lengths.

gkunter commented 7 years ago

This issue doesn't seem to be properly fixed (anymore). Test case: BNC, sentence context L15, R15, query strings:

[be] friendlier * 
* friendlier [n*]

This results in the following exception, probably because there is a token that is matched by both query strings:

Type       ValueError
Message    Length mismatch: Expected axis has 28 elements, new values have 29 elements

 classes.py, line 83: run
   app.py, line 1247: <lambda>
     session.py, line 356: aggregate_data
       managers.py, line 654: process
         managers.py, line 200: mutate
           functionlist.py, line 60: lapply
             functions.py, line 1035: evaluate
               functions.py, line 975: evaluate
                 corpus.py, line 1139: get_sentence_ids
gkunter commented 7 years ago

Also in ICE_NG, with query string [e*].[v*] up *{3,5} .