apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.02k forks source link

TopDocsCollector Should Not Depend on Priority Queue [LUCENE-8877] #9920

Open asfimport opened 5 years ago

asfimport commented 5 years ago

TopDocsCollector is tightly coupled to the notion of priority queue, which is not necessarily a good abstraction to have since the collector really just needs an interface to iterate on and hold docID and score, with possibly shard indexes.

 

We should rewrite this to a more simplistic interface with priority queue being the default implementation


Migrated from LUCENE-8877 by Atri Sharma (@atris), updated Jun 26 2019

asfimport commented 5 years ago

Atri Sharma (@atris) (migrated from JIRA)

Any thoughts on this? I am envisioning eventually getting to a state where the underlying data structure used is opaque to IndexSearcher API. This should allow an abstraction with high degree of flexibility

asfimport commented 5 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Abstraction increases complexity too, it feels reasonable to me that top-docs collectors are backed by a priority queue since this is the go-to data-stucture for top-k selection problems? If you need more flexibility, you could directly extends Collector as opposed to TopDocsCollector?