apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.02k forks source link

FairDistributionMergePolicy [LUCENE-10679] #11714

Open asfimport opened 2 years ago

asfimport commented 2 years ago

TieredMergePolicy and LogMergePolicy can define merge specifications which have a skew in the distribution of overall "work" (i.e. number of documents to process) amongst threads. This is especially true when the underlying segment distribution is highly skewed.

 

A more optimal distribution can be achieved by performing a variation of the integer partitioning algorithm. Initial tests show a more optimal distribution on a simulated set of skewed segment distributions.


Migrated from LUCENE-10679 by Atri Sharma (@atris)

vigyasharma commented 2 years ago

A more optimal distribution can be achieved by performing a variation of the integer partitioning algorithm. Initial tests show a more optimal distribution on a simulated set of skewed segment distributions.

Atri - Could you share more about the merge distribution algorithm you're proposing here (and any performance numbers you've seen from early tests)? This sounds interesting, would love to know more.