Open asfimport opened 11 years ago
Simon Willnauer (@s1monw) (migrated from JIRA)
Can you provide a simple testcase that shows the problem?
Ryan Lauck (migrated from JIRA)
ComplexPhraseQuery rewrites complex proximity searches into SpanQuerys. FastVectorHighlighter currently just ignores SpanQuery, I'm not sure how Highlighter behaves. I use ComplexPhraseQuery in production so I'd be happy to help trace this issue if you can provide some sample queries or some test cases.
Mark Miller (@markrmiller) (migrated from JIRA)
The std Highlighter can highlight span querys when in postion aware mode. It uses a memory index and decomposes the original query to find the matches.
Jason Nacional (migrated from JIRA)
Thanks all for the quick response. I can provide you some sample query:
Let's say we have the following line: Make Sure Our Emails Reach Your Inbox
the query is: "(Make Sur*) Inbox"\~10
after searching, the hits are correct. but somehow "Make" is not being highlighted. Am I missing something here? here is my code.
...
Query rewrite_result = phrase.rewrite(IndexReader.open(INDEX_DIR));
QueryScorer qs_phrases = new QueryScorer(rewrite_result);
qs_phrases.setExpandMultiTermQuery(true);
highlighter = new Highlighter(htmlFormatter, qs_phrases);
highlighter.setTextFragmenter(new NullFragmenter());
highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
//get the temp text
if(text == null){
text = highlighter.getBestFragment(analyzer, "", pText);
}else{
text_temp = highlighter.getBestFragment(analyzer, "", text);
text = text_temp;
}
...
I'll start to create a test case for more info.
Jason Nacional (migrated from JIRA)
Just an addition, I also used ComplexPhraseQueryParser as a default parser.
Jason Nacional (migrated from JIRA)
I also have a question about rewriting ComplexPhraseQuery. Do I really need to always open an IndexReader? I mean, in our system, searching and viewing the hit document is a separate page. So what I'm doing to highlight terms (since I used ComplexPhraseQuery and it needs to be "rewritten") is to open an IndexReader.
I hope you understand my concerns. And I apologize for so many questions.
Thanks.
Ryan Lauck (migrated from JIRA)
Given your above example queries yes, the IndexReader is used during rewrite to enumerate all the possible terms in a wildcard query. If your query only consisted of basic TermQuery and PhraseQuery I think you could provide a static, empty IndexReader like PostingsHighlighter does. The docs recommend reusing a single IndexSearcher to avoid some of the overhead of opening new IndexReaders every time.
Jason Nacional (migrated from JIRA)
I tried to generate the translated query. Here it is:
spanNear([spanOr([content:make, spanOr([content:sur, content:sure, content:surely, content:surely.â, content:surer, content:surest, content:surety, content:surf, content:surface, content:surfaced, content:surfaces, content:surge, content:surged, content:surgeon, content:surgeonâ, content:surgery, content:surges, content:surgical, content:surging, content:surlier, content:surly, content:surmise, content:surmised, content:surmises, content:surmount, content:surmounted, content:surmounting, content:surname, content:surnames, content:surovsky, content:surpass, content:surpassed, content:surpassing, content:surplice, content:surplices, content:surplus, content:surprise, content:surprised, content:surprises, content:surprising, content:surprisingly, content:surrender, content:surrendered, content:surrendering, content:surrenders, content:surreptitiously, content:surround, content:surrounded, content:surrounding, content:surroundings, content:surrounds, content:suruchi, content:survey, content:surveyed, content:surveying, content:surveys, content:survival, content:survive, content:survived, content:surviving, content:sury])]), content:inbox], 10, true)
could it be possible that the problem is on the first spanOr??
Ahmet Arslan (@iorixxx) (migrated from JIRA)
May be highlighter works without re-write after https://issues.apache.org/jira/browse/LUCENE-4728?
Jason Nacional (migrated from JIRA)
Hi @iorixxx, What do you mean?
Jason Nacional (migrated from JIRA)
I decided to run my script using SurroundQuery and create a custom Interpreter to convert the queries into a surround query language. But how can I enable leading wildcard query searching??
Scott Stults (@sstults) (migrated from JIRA)
Looking at the query structure, this could be related to #3363 (problems highlighting nested span queries).
Just want to ask an assistance using ComplexPhraseQuery. I mean, when it comes to highlighting the hits are not correct. I also started using ComplexPhraseQueryParser to support complex proximity searches.
Migrated from LUCENE-4743 by Jason Nacional, updated Oct 09 2015