apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.02k forks source link

TopFieldCollector(s) Should Prepopulate Sentinel Objects [LUCENE-8970] #10013

Open asfimport opened 5 years ago

asfimport commented 5 years ago

We do not repopulate the hit queue with sentinel values today, thus leading to extra checks and extra code.


Migrated from LUCENE-8970 by Atri Sharma (@atris), updated Sep 13 2019

asfimport commented 5 years ago

Atri Sharma (@atris) (migrated from JIRA)

I did a prototype of this –- it is a bit hairy since, unlike TopDocsCollector, TopFieldComparator does not directly perform comparisons against the bottom but instead uses FieldComparator to do the job. The problem is that FieldComparatorcould maintain its internal queue, which needs to be accordingly set with sentinel values if the queue is prepopulated. This works well with straight implementations, but for comparators like RelevanceComparator, which do not use the passed in slot but instead depend on the presence of the scorer instance to generate the doc to be placed, this can be an issue.

I wonder if it is worth exposing a prePopulate API in FieldComparator which does what it advertises – allows prepopulating the internal structure used for maintaining docID mappings.

asfimport commented 5 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Maybe we can try to do a quick hack to see how much it could bring, but my intuition is that it wouldn't help with performance given that that we are looking at a condition that is easily predictable?