Open ate47 opened 3 weeks ago
I'll have to take a look at the code to understand how things are handled at the moment.
Any news on what to do?
public QueryResponse search(SolrQuery query) throws SolrServerException, IOException {
int nDocs;
if (maxDocs > 0) {
nDocs = maxDocs;
} else {
long docCount = client.query(query.setRows(0)).getResults().getNumFound();
nDocs = Math.max((int) Math.min(docCount, Integer.MAX_VALUE), 1);
}
return client.query(query.setRows(nDocs));
}
For this code we can probably allow the number of rows to be set on the query before this method is called. Something like if(query.getRows() != null)....
to check if it is already set.
Looks like this here is where we extract variables from the SPARQL query: https://github.com/eclipse-rdf4j/rdf4j/blob/755a631efbc95df52e4e590984ab9b0049d4fef5/core/sail/lucene-api/src/main/java/org/eclipse/rdf4j/sail/lucene/QuerySpecBuilder.java#L450
So that can be extended to add ROWS
. And then follow the code up to see where else we need to extend it so that the field is set correctly on the SolrQuery object.
We can't do the same with public synchronized TopDocs search(Query query) throws IOException {}
since the query object doesn't have any info about the number of rows or equivalent. So there we would probably have to make a new method public synchronized TopDocs search(Query query, int nDocs) throws IOException {}
.
Same for public SearchHits search(SearchRequestBuilder request, QueryBuilder query) {}
.
@hmottestad do you think you can get a look at how I have implemented it in the PR #5163?
Thanks for the reminder :)
Problem description
Inside the LuceneSail, an option exists to tell what is the amount of documents to retrieve. It is set with the property
maxDocuments
during the sail init and can't be set at runtime. We would need an option to allow this.Preferred solution
An option would be to add a virtual predicate during query time like
search:maxDocuments
with a property during init to enable or disable it, maybeallowQueryMaxDocument
.My main concern is to implement it. The maxDocuments values is used inside each index implementation.
So the different methods will need to have a max documents param. I would like an opinion on it before implementing it.
Are you interested in contributing a solution yourself?
Yes
Alternatives you've considered
No response
Anything else?
https://github.com/ec-doris/kohesio-backend/issues/268