Open benwtrent opened 3 months ago
Pinging @elastic/ml-core (Team:ML)
Pinging @elastic/es-analytical-engine (Team:Analytics)
@dgieselaar is the original finder of this bug.
@dgieselaar it doesn't seem like the request has random_sampler
in it at all? Is this all about categorize_text
with a top_hits
sub aggregation?
@benwtrent no, my head is a mess today. copied the wrong request. Here is the right one:
GET logs-foo-default*/_search?error_trace=true
{
"track_total_hits": false,
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-24h"
}
}
}
]
}
},
"aggs": {
"sampler": {
"random_sampler": {
"probability": 1
},
"aggs": {
"samples": {
"top_hits": {
"size": 3,
"_source": [
"message"
]
}
}
}
}
}
}
Thank you @dgieselaar ! I was able to replicate. It is repeatable with some very simple text data.
POST test_text/_search
{
"aggs": {
"random_sampler": {
"random_sampler": {
"probability": 0.5
},
"aggs": {
"samples": {
"top_hits": {
"size": 1,
"_source": [
"text"
]
}
}
}
}
}
}
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.search.Scorable.score()" because "this.scorer" is null
at org.apache.lucene.core@9.10.0/org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:72)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.metrics.TopHitsAggregator$1.collect(TopHitsAggregator.java:159)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:97)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:81)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.bucket.sampler.random.RandomSamplerAggregator.getLeafCollector(RandomSamplerAggregator.java:120)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:222)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.MultiBucketCollector$1.getLeafCollector(MultiBucketCollector.java:92)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.aggregations.AggregatorCollector.getLeafCollector(AggregatorCollector.java:35)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.query.QueryPhaseCollector.getLeafCollector(QueryPhaseCollector.java:165)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:415)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:360)
at org.elasticsearch.server@8.13.2/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$4(ContextIndexSearcher.java:345)
at org.apache.lucene.core@9.10.0/org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
There are two causes here:
One, with probility is 1.0
we return a collector that doesn't accept setting the scorer.
if (probability >= 1.0) {
grow(1);
return new LeafBucketCollector() {
@Override
public void collect(int doc, long owningBucketOrd) throws IOException {
collectExistingBucket(sub, doc, 0);
}
};
}
So, that needs to accept scorable and pass it to the sub.
But for collectExistingBucket
, we need the scorable there and I am not sure its possible.
"stack_trace": "java.lang.NullPointerException: Cannot invoke \"org.apache.lucene.search.Scorable.score()\" because \"this.scorer\" is null
\tat org.apache.lucene.core@9.11.0/org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:72)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.metrics.TopHitsAggregator$1.collect(TopHitsAggregator.java:158)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:98)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.bucket.sampler.random.RandomSamplerAggregator.getLeafCollector(RandomSamplerAggregator.java:132)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:222)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.MultiBucketCollector$1.getLeafCollector(MultiBucketCollector.java:92)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.aggregations.AggregatorCollector.getLeafCollector(AggregatorCollector.java:35)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.query.QueryPhaseCollector.getLeafCollector(QueryPhaseCollector.java:165)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:420)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:365)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3(ContextIndexSearcher.java:350)
\tat org.apache.lucene.core@9.11.0/org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
\tat org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
\tat java.base/java.lang.Thread.run(Thread.java:1570)
"
Elasticsearch Version
any
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
When I use the top_hits aggregation as a child of a random_sampler aggregation, it triggers a shard request failure with an NPE. It only throws an NPE when sort is not defined - if I add it (e.g. { "sort": { "@timestamp": "desc" } }) it returns results.
Steps to Reproduce
Logs (if relevant)
Details
``` { "error": { "root_cause": [ { "type": "null_pointer_exception", "reason": """Cannot invoke "org.apache.lucene.search.Scorable.score()" because "this.scorer" is null""", "stack_trace": """org.elasticsearch.ElasticsearchException$1: Cannot invoke "org.apache.lucene.search.Scorable.score()" because "this.scorer" is null at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:704) at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.action.search.SearchPhaseExecutionException.guessRootCauses(SearchPhaseExecutionException.java:160) at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:686) at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:632) at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.rest.RestResponse.build(RestResponse.java:186) at org.elasticsearch.server@8.15.0-SNAPSHOT/org.elasticsearch.rest.RestResponse.