apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.03k forks source link

Preview issue [LUCENE-5697] #6759

Open asfimport opened 10 years ago

asfimport commented 10 years ago

In DocFetcher, which uses Lucene v3.5.0, we stumbled on a bug. The lead of DocFetcher has investigated and found the problem seems to be in Lucene. I do not know if this bug has been fixed in a later Lucene version.

Issue: We use "proximity search": search on multiple words in a directory with about 300 PDF files.
E.g. search for "wordA wordB wordC"\~50, i.e. three words within 50 words distance of each other. The resulting documents are correct. But the highligted text in the document is often missing.

If the words are in the SAME order as in the search AND on the SAME page, then the higlight works correct. But if the order of the words is different from the search (like "wordA wordC wordB" OR the words are not on the same page, then that text is not highlighted.

As we use the proximity search on multiple words often, it severely degrades the usability.


Migrated from LUCENE-5697 by Martin Schoenmakers, updated Jun 05 2014 Environment:

DocFetcher 1.1.11 on Win 7(64) pro
asfimport commented 10 years ago

Chris M. Hostetter (@hossman) (migrated from JIRA)

1) Lucene 3.5 is pretty old.

2) At first glance, it sounds like the problems you are describing could simply be due to a disconnect between how your searches are executed vs how you are using the highlighter code.

w/o specific example code or a reproducible test case, there's really no way to tell if what you are describing is a genuine bug or a missunderstanding of the API.

3) there multiple highlighters available, and a lot of different ways to configure them, so even if there is a bug, w/o more specifics there really isn't enough info here to try and diagnose where the bug is, let alone what the bug is.

can you please provide some code (ideally a stand alone JUnit test using the lucene test-framework) demonstrating the problem?