elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.19k stars 24.84k forks source link

The hits.total differs from actual number of documents in `ids` query #25603

Closed JagathJayasinghe closed 4 years ago

JagathJayasinghe commented 7 years ago

Elasticsearch version: 5.3.1 Elasticsearch setup: 9 node cluster Plugins installed: repository-s3, x-pack JVM version: openjdk version "1.8.0_131" OS version: Ubuntu Server 16.04 Machine specifications: 8x CPU, 8GB RAM, 150GB SSD, 10Gbps Network

Description of the problem including expected versus actual behavior: Elasticsearch returns incorrect hits.total value even though results seem correct.

After stripping the query back considerably we see that the hits.total inconsistency happens when using the ids query below, no matter how many or which ids we use.

Steps to reproduce: We so far haven't been able to work out what triggers this particular bug.

Example query and results as shown in the attached image below.

issue_1

A rolling restart of the cluster seems to resolve the issue, I'm not sure if this is due to an in-process bug or corrupt index.

We would like to find if this is a potential bug in Elasticsearch or has anyone seen this happening as we had no luck finding what is causing it.

dannymurphygfk commented 5 years ago

Unfortunately that wasn't really an option to disable those scripts as it would have disabling them for an indefinite period of time, which would have rendered the cluster unusable from our product. The cluster is used by our Beta/QA systems.

RayRenteria commented 5 years ago

I'm having the same issue. Fortunately, it runs in an off-line job queue and I can just fail the job when this behavior is exhibited. The job gets resubmitted for another try and eventually works -- usually on the second try.

Here's the hits object captured from my debugger:

{
  "total": 48,
  "max_score": 507.43854,
  "hits": []
}

Note: Usually the query will return with the hits object correctly populated but the same query with the same values will occasionally exhibit the anomalous behavior.

I've been having the issue since 5.4. I can't reproduce it on demand. When it breaks in my debugger, I'll copy / paste the generated JSON to Kibana Console and run it with expected results.

I migrated the index over when I upgraded to ES 6.6 -- I did not rebuild it -- and I have the same intermittent behavior there, too.

Here's some context:

Here's the query (old _type style):

GET /address/segment/_search
{
  "size": 1,
  "_source": [
    "segmentmid.coordinates",
    "ziptype",
    "street",
    "placename",
    "usps_prefname",
    "placename",
    "admin_name",
    "altplacename",
    "localadmin",
    "borough",
    "locality",
    "neighbourhood",
    "altneighbourhood",
    "countryiso2",
    "county",
    "verifiedlocations.housenumber",
    "state",
    "region",
    "altregion",
    "street_pref",
    "street",
    "verifiedlocations.postcode",
    "postcode"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "common": {
                  "street_pref": {
                    "query": "PIONEER",
                    "_name": "street",
                    "boost": 5
                  }
                }
              },
              {
                "common": {
                  "street": {
                    "query": "PIONEER",
                    "_name": "street",
                    "boost": 1
                  }
                }
              },
              {
                "common": {
                  "street_pref": {
                    "query": "PIONEER BLVD",
                    "_name": "street",
                    "boost": 5
                  }
                }
              },
              {
                "common": {
                  "street": {
                    "query": "PIONEER BLVD",
                    "_name": "street",
                    "boost": 1
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "multi_match": {
                  "query": "ARTESIA",
                  "fields": [
                    "usps_prefname",
                    "placename",
                    "admin_name",
                    "altplacename",
                    "localadmin",
                    "borough",
                    "locality",
                    "neighbourhood",
                    "altneighbourhood"
                  ],
                  "operator": "and",
                  "fuzziness": 0,
                  "_name": "city",
                  "boost": 12
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "multi_match": {
                  "query": "CA",
                  "fields": [
                    "state",
                    "region",
                    "altregion"
                  ],
                  "operator": "and",
                  "fuzziness": 0,
                  "_name": "state",
                  "boost": 7.5
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "multi_match": {
                  "query": "90701",
                  "fields": [
                    "verifiedlocations.postcode",
                    "postcode"
                  ],
                  "operator": "and",
                  "fuzziness": 0,
                  "_name": "zip",
                  "boost": 16.6
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "multi_match": {
                  "query": "US",
                  "fields": [
                    "countryiso2"
                  ],
                  "operator": "and",
                  "fuzziness": 0,
                  "_name": "countryiso2",
                  "boost": 2
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Here's the proper / desired result returned from Kibana console:

{
  "took" : 96,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 48,
    "max_score" : 507.43854,
    "hits" : [
      {
        "_index" : "address",
        "_type" : "segment",
        "_id" : "LID1101576711766_1",
        "_score" : 507.43854,
        "_source" : {
          "countryiso2" : "US",
          "street_pref" : "Pioneer Blvd",
          "street" : [
            "Pioneer",
            "Pioneer Blvd",
            "Pioneer Boulevard"
          ],
          "postcode" : "90701",
          "usps_prefname" : "ARTESIA",
          "state" : "CA",
          "admin_name" : "Artesia",
          "placename" : [
            "Artesia",
            "ARTESIA",
            "CERRITOS"
          ],
          "segmentmid" : {
            "coordinates" : [
              -118.08214449992698,
              33.862094500000296
            ]
          },
          "ziptype" : "Default"
        },
        "matched_queries" : [
          "zip",
          "state",
          "countryiso2",
          "city",
          "street"
        ]
      }
    ]
  }
}

--Ray

yc1024 commented 5 years ago

I've been having the issue since 6.1.3

aleiakkim commented 4 years ago

Still having this issue in 2020, using version 6.8.

Any solutions?

jimczi commented 4 years ago

I think we're mixing lots of different issues/problems here so I am going to close this issue. The original issue was about a terms query on the _id field that should match a single document but returns a hits.total greater than 1. We couldn't reproduce this specific issue in 6.x and 7.x so please reopen if you have a reproducible case.

Still having this issue in 2020, using version 6.8.

Please make sure that the issue is the same as the original one and if not, don't hesitate to open a new issue that describes the problem and how to reproduce.