elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.25k stars 24.85k forks source link

Error when running "multi_match" on double field with scientific notation and analyzer #21665

Closed cbuescher closed 7 years ago

cbuescher commented 8 years ago

This strange combination of three parameters just came in via CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=debian/252/console

The query that trips this looks like this (simplified):

"multi_match" : {
    "query" : 6.075210893508043E-4,
    "fields" : [
      "mapped_double^1.0"
    ],    
    "analyzer" : "simple"
  }

So although this is a double field, the value gets analyzed in lucenes QueryBuilder#analyzeTerm which leaves only the String "e" which in turn cannot be parsed to a double. This later leads to a NumberFormatException:

java.lang.NumberFormatException: For input string: "e"
    at __randomizedtesting.SeedInfo.seed([8BA49CFAC7249C2F:7C5F9EC4B6A759C5]:0)
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)
    at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$3.parse(NumberFieldMapper.java:350)
    at org.elasticsearch.index.mapper.NumberFieldMapper$NumberType$3.termQuery(NumberFieldMapper.java:360)
    at org.elasticsearch.index.mapper.NumberFieldMapper$NumberFieldType.termQuery(NumberFieldMapper.java:801)
    at org.elasticsearch.index.search.MatchQuery.termQuery(MatchQuery.java:273)
    at org.elasticsearch.index.search.MatchQuery.blendTermQuery(MatchQuery.java:385)
    at org.elasticsearch.index.search.MultiMatchQuery.blendTermQuery(MultiMatchQuery.java:316)
    at org.elasticsearch.index.search.MatchQuery$MatchQueryBuilder.newTermQuery(MatchQuery.java:303)
    at org.apache.lucene.util.QueryBuilder.analyzeTerm(QueryBuilder.java:277)
    at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:241)
    at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:88)
    at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:249)
    at org.elasticsearch.index.search.MultiMatchQuery.parseAndApply(MultiMatchQuery.java:63)
    at org.elasticsearch.index.search.MultiMatchQuery.parse(MultiMatchQuery.java:81)
    at org.elasticsearch.index.query.MultiMatchQueryBuilder.doToQuery(MultiMatchQueryBuilder.java:748)
    at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:97)
    at org.elasticsearch.index.query.MultiMatchQueryBuilderTests.testAnalyzerOnDoubleField(MultiMatchQueryBuilderTests.java:313)

I wonder where we should catch this. Should we always throw an error when specifying an analyzer together with a numeric field in the query builder already? I think that would make sense.

dakrone commented 8 years ago

Related to #21489 (another scientific notation issue)

cbuescher commented 8 years ago

@dakrone yes, but only partially related. Scientific notation is okay as long as we don't analyze it here.

cbuescher commented 8 years ago

The "match" query seems to have the similar problem.

clintongormley commented 8 years ago

Should we always throw an error when specifying an analyzer together with a numeric field in the query builder already?

Multi-match can have multiple fields, so perhaps we only use the analyzer for text and keyword fields. (I'm in two minds about keyword fields, but I could imagine lower casing the search term).

For match, where only one field can be specified, we should probably throw an exception on fields that are neither text nor keyword.

s1monw commented 7 years ago

here is the REPRODUCE line: REPRODUCE WITH: gradle :core:test -Dtests.seed=8BA49CFAC7249C2F -Dtests.class=org.elasticsearch.index.query.MultiMatchQueryBuilderTests -Dtests.method="testToQuery" -Dtests.security.manager=true -Dtests.locale=lv-LV -Dtests.timezone=Asia/Kathmandu

cbuescher commented 7 years ago

Closing this after discussion with @jpountz in https://github.com/elastic/elasticsearch/pull/23684. We agreed that the error we currently throw in this edge case is okay and we should change the test to avoid running into this in our randomization.