elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.53k stars 24.61k forks source link

ES|QL: ST_CENTROID_AGG fails when no doc-values are available #112505

Closed craigtaverner closed 2 days ago

craigtaverner commented 1 week ago

When a point field is used only for aggregating in ST_CENTROID_AGG, then the field extraction is switched to extract doc-values for higher performance:

FROM indexed | WHERE ST_INTERSECTS(location, TO_GEOSHAPE("POLYGON ((-10 -10, -10 10, 10 10, 10 -10, -10 -10))")) | STATS COUNT(*), ST_CENTROID_AGG(location)

However, if the index field mapping is mapped with out doc values:

{
  "properties" : {
    "location": { "type" : "geo_point",  "index" : false, "doc_values" : false }
  }
}

This this query will fail:

WARNING: Uncaught exception in thread: Thread[elasticsearch[node_s5][esql_worker][T#2],5,TGRP-SpatialPushDownGeoPointIT]
java.lang.AssertionError: BYTES_REF NOT IN (NULL, LONG)
    at __randomizedtesting.SeedInfo.seed([E092AB3A4E3619A6]:0)
    at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.process(ValuesSourceReaderOperator.java:164)
    at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:76)
    at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
    at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
    at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

The message BYTES_REF NOT IN (NULL, LONG) basically means that what was actually read from the store was BYTES_REF (as expected for geo_point from source), but what the reader expected to see from the field type was LONG (as expected from geo_point from doc-values).

elasticsearchmachine commented 1 week ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

craigtaverner commented 1 week ago

The problem is the physical plan optimizer does not consider whether a field has doc-values or not when planning a doc-values extraction for spatial fields used in ST_CENTROID_AGG. This consideration is a trivial addition to the allowedForDocValues method, so should be very easy to fix.