KorAP / Krill

:mag: A Corpus Data Retrieval Index using Lucene for Look-Ups
BSD 2-Clause "Simplified" License
16 stars 3 forks source link

Focus bug in sequences #89

Closed Akron closed 10 months ago

Akron commented 1 year ago

It seems that sometimes focus queries are not crrectly embedded in sequences. I was sure we already had and fixed that issue, but sometimes it reappears. E.g. in the NKJP1M-SGJP instance, the following queries are not identical, although they should be:

contains(<base/s=s>, (focus({[nkjp/l="da(wa)?ć"][]{,5} do} [nkjp/p=subst]) zrozumienia))
contains(<base/s=s>, {[nkjp/l="da(wa)?ć"][]{,5} do} [nkjp/p=subst & orth=zrozumienia])
margaretha commented 1 year ago
Query Number Query Number of Results
1 contains(<base/s=s>, (focus({[nkjp/l="da(wa)?ć"][]{,5} do} [nkjp/p=subst]) zrozumienia)) 6
2 contains(<base/s=s>, {[nkjp/l="da(wa)?ć"][]{,5} do} [nkjp/p=subst & orth=zrozumienia]) 8

The 2 missing hits are in the following docs:

When adding the docs as a virtual corpus to Query 1, the 2 hits are found. So it seems that there is a problem with skipping documents.

margaretha commented 1 year ago

The NKJP instance uses Krill version 0.59.5. It seems that the bug has been fixed in a newer version of Krill, probably in f420cb341ead7e83005f53fbf81fb196e0f30aae.

Akron commented 10 months ago

You are right. Now that Java 11 is available everywhere, I was able to update the instance! Thank you!