elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.1k stars 24.83k forks source link

ES|QL: Enrich with wrong field type throws NumberFormatException #107357

Open luigidellaquila opened 7 months ago

luigidellaquila commented 7 months ago

Reproducible on the tests dataset with

row a = "foo" | enrich ages_policy on a

The match field for ages_policy is age_range, that is an integer_range

java.lang.NumberFormatException: For input string: "foo"
         at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
         at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
         at java.base/java.lang.Double.parseDouble(Double.java:792)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.NumberFieldMapper$NumberType.objectToDouble(NumberFieldMapper.java:1442)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$6.parse(NumberFieldMapper.java:1001)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$6.parse(NumberFieldMapper.java:998)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.RangeType.parseValue(RangeType.java:827)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.RangeType.rangeQuery(RangeType.java:874)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.RangeFieldMapper$RangeFieldType.rangeQuery(RangeFieldMapper.java:313)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.index.mapper.RangeFieldMapper$RangeFieldType.termQuery(RangeFieldMapper.java:295)
         at org.elasticsearch.xpack.esql.enrich.QueryList$TermQueryList.getQuery(QueryList.java:109)
         at org.elasticsearch.xpack.esql.enrich.EnrichQuerySourceOperator.getOutput(EnrichQuerySourceOperator.java:71)
         at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:247)
         at org.elasticsearch.compute.operator.Driver.run(Driver.java:178)
         at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:368)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
         at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
         at java.base/java.lang.Thread.run(Thread.java:1583)

We should probably have a more strict validation for these cases and return a more descriptive error message

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

dnhatn commented 7 months ago

@craigtaverner Can you take this one?

craigtaverner commented 3 weeks ago

I'm investigating this, but it turns out that if we do a type check up-front, for example say that int_range cannot be matched to a keyword field, then we rule out some queries that currently work. If the keyword field actually contains an integer, such that Integer.parseInt does not throw an exception, then the query will work. Since that is a per-row decision, and a type check is at the query level, we need to decide between two options: