apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.66k stars 1.03k forks source link

Highlighter mergeContiguousFragments shouldn't merge 0-score fragments [LUCENE-6471] #7530

Open asfimport opened 9 years ago

asfimport commented 9 years ago

Highlighter.mergeContiguousFragments merges adjacent fragments it is given. But it is given a list of fragments that do not necessarily have embedded highlights (e.g. have a score of 0), and so it could grow a fragment needlessly. I never figured out why this old highlighter keeps around such fragments instead of eagerly tossing them when the fragment completes, which is what I think it should do. That would address this problem and might make things faster. I'm not sure if any highlighter user wants the non-scoring fragments though.


Migrated from LUCENE-6471 by David Smiley (@dsmiley), 1 vote, updated Jul 11 2017

asfimport commented 7 years ago

Jim Richards (migrated from JIRA)

I ended up writing this to get around this issue

    public final String getBetterFragments(TokenStream tokenStream, String text, int maxNumFragments, String separator) throws IOException, InvalidTokenOffsetsException {

        maxNumFragments = Math.max(1, maxNumFragments); // sanity check
        TextFragment[] frag = getBestTextFragments(tokenStream, text, false, maxNumFragments);

        StringBuilder result = new StringBuilder();
        for (int i = 0; i < frag.length; i++) {
            if (frag[i] != null && frag[i].getScore() > 0) {
                if (i > 0) {
                    result.append(separator);
                }
                result.append(frag[i].toString());
            }
        }
        return result.toString();

    }
asfimport commented 7 years ago

David Smiley (@dsmiley) (migrated from JIRA)

Jim Richards Have you considered the UnifiedHighlighter?

asfimport commented 7 years ago

Jim Richards (migrated from JIRA)

@dsmiley,

I'm using hibernate-search-orm, which is using 5.5.4 of the Lucene libraries and UnifiedHighlighter is from 6.x. I could probably do some magic in pom.xml to exclude 5.5.4 but my basic test had too many incompatibilities.

Jim.