apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.55k stars 1k forks source link

OfflineSorter shouldn't always forceMerge in the end [LUCENE-7141] #8196

Open asfimport opened 8 years ago

asfimport commented 8 years ago

Today it always does a final merge, to collapse all segments into a single segment.

But typically the caller is going to re-iterate all values anyway, to go off and build an FST or a BKD tree or something, and so that final forceMerge is often not necessary and the merging can be done on the fly when the caller consumes the result.

This is somewhat tricky to do ... I'd like to break it into steps, starting with fixing the ByteSequencesReader API to implement BytesRefIterator instead of its own read(BytesRefBuilder) method as a first step.


Migrated from LUCENE-7141 by Michael McCandless (@mikemccand), updated Mar 26 2016 Attachments: LUCENE-7141.patch

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

First phase ... just a rote cutover to BytesRefIterator.

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

+1. This is something I was going to suggest.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 78d5cfefe2453345c498984bf0e405d254a9d5bc in lucene-solr's branch refs/heads/master from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=78d5cfe

LUCENE-7141: switch OfflineSorter's ByteSequencesReader to BytesRefIterator

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit c46d7686643e7503304cb35dfe546bce9c6684e7 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c46d768

LUCENE-7141: switch OfflineSorter's ByteSequencesReader to BytesRefIterator