elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.17k stars 24.84k forks source link

The hits.total differs from actual number of documents in `ids` query #25603

Closed JagathJayasinghe closed 4 years ago

JagathJayasinghe commented 7 years ago

Elasticsearch version: 5.3.1 Elasticsearch setup: 9 node cluster Plugins installed: repository-s3, x-pack JVM version: openjdk version "1.8.0_131" OS version: Ubuntu Server 16.04 Machine specifications: 8x CPU, 8GB RAM, 150GB SSD, 10Gbps Network

Description of the problem including expected versus actual behavior: Elasticsearch returns incorrect hits.total value even though results seem correct.

After stripping the query back considerably we see that the hits.total inconsistency happens when using the ids query below, no matter how many or which ids we use.

Steps to reproduce: We so far haven't been able to work out what triggers this particular bug.

Example query and results as shown in the attached image below.

issue_1

A rolling restart of the cluster seems to resolve the issue, I'm not sure if this is due to an in-process bug or corrupt index.

We would like to find if this is a potential bug in Elasticsearch or has anyone seen this happening as we had no luck finding what is causing it.

cbuescher commented 6 years ago

I can query for both Ids

This is very interesting indeed. Thanks for trying, I'm really sorry I haven't had the bandwidth to dig into this further so far, but I have a few more vague suspicions I want to run past somebody else more knowledgable in those areas first. I hope to be back with something else to try to narrow this down soon.

cbuescher commented 6 years ago

Hi @dannymurphygfk,

after reading you comment from two days ago I realized we didn't consider looking into potential interferences of caches yet, most prominently the request cache. There is an option to disable it per request that I'd like you to try with the problematic ids queries and see if the result changes with and without using the cache.

If that doesn't turn up anything new, maybe clearing all caches is something you can try next. This might have a small effect when using it on your production cluster, so I'd try to do this on the dev cluster first when it is in a "problematic" state. However ES 5.x shouldn't rely to much on caching if your cluster isn't under heavy load, so it should also be relatively save to try on the production cluster if you are not running near full capacity.

Two other things I wanted to ask or re-check:

Hope this brings us a step nearer to narrowing this strange bug down.

dannymurphygfk commented 6 years ago

Hi @cbuescher ,

Thanks I have just tried what you suggested but unfortunately to no avail.

Disabling the request cache doesn't affect the result... returns same result with or without cache disabled.

Likewise, clearing of all caches didn't solve the issue.

Does the inclusion of the size parameter give any clues ? Because I can query an id that returns bogus totals with size:0 and in this case it correctly returns 1... but anything else and its incorrect ?

Incidentally the ids I was using 2 days ago both fail for me today :(

cbuescher commented 6 years ago

Thanks I have just tried what you suggested but unfortunately to no avail.

:-( I was hoping to get closer to the source of all this, but its good to check nevertheless. Thanks for the other answers which confirm my current assumptions.

Does the inclusion of the size parameter give any clues ? Because I can query an id that returns bogus totals with size:0 and in this case it correctly returns 1... but anything else and its incorrect ?

I will need to do some digging and thinking about this, might be the best clue at the moment.

Incidentally the ids I was using 2 days ago both fail for me today

Can you specify this a bit more? Do you mean that now

POST my_index/my_type my_other_type/_search?preference=2904
{
    "query": {
        "ids": {
            "values": [239563172, 225238857]
        }
    }
}

Doesn't correctly return 2 any more? Was this before or after the clearing of the caches? Any Nodes restarted in between?

dannymurphygfk commented 6 years ago

Yes Both queries that were returning correct totals yesterday are now returning bogus totals, both before and after the clearing of the cache.

POST my_index/my_type my_other_type/_search?preference=2904
{
    "query": {
        "ids": {
            "values": [239563172, 225238857]
        }
    }
}

POST my_index/my_type my_other_type/_search?preference=2904
{
    "query": {
        "ids": {
            "values": [225238857]
        }
    }
}

Looks like 1 node went down last night (node 6 - host issue) and was restarted.

btw neither of these ids worked for a colleague yesterday who would have been using a different preference.

serj-p commented 6 years ago

We are also experiencing this issue at nimble.com any id query like {ids: {values: [id1]}} returns hits.total ~ 1000 what confuses user when he/she operates with one or a few(<30) docs and operation progress then shows like 1k docs.

we are running es 5.4.2 we have never seen this on 2.4 we use external versioning, child/nested docs sometimes we index docs with version_type force

we are also encountering way too many errors like 'search_phase_execution_exception', u'No search context found for id' we didn't have on 2.4

Stono commented 6 years ago

Hey, Just to chime in, I also have this problem. Creating a brand new index on my cluster and indexing 10 documents, I get 50 hits! It seems a suspicious correlation to the fact that I also have 5 nodes (10x5=50)

{
  "name" : "elasticsearch-master-1",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "RxDSRv09Rt2zhQVNUeC_xQ",
  "version" : {
    "number" : "6.1.1",
    "build_hash" : "bd92e7f",
    "build_date" : "2017-12-17T20:23:25.338Z",
    "build_snapshot" : false,
    "lucene_version" : "7.1.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
{
"took": 120,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 1,
"hits": [ { "_index": "metrics20-2018.01.10","_type": "test","_id": "14","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "19","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "22","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "26","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "5","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "10","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "21","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "32","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "33","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "metrics20-2018.01.10","_type": "test","_id": "20","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}}]
}
}
Stono commented 6 years ago

This is the same on a different version of elasticsearch too?:

{
  "name" : "ip-10-155-0-205",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "OyESK4dhQKOXvR6V8W-Dtg",
  "version" : {
    "number" : "5.2.2",
    "build_hash" : "f9d9b74",
    "build_date" : "2017-02-24T17:26:45.835Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.1"
  },
  "tagline" : "You Know, for Search"
}
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 1,
"hits": [ { "_index": "karl20-2018.01.10","_type": "test","_id": "14","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "19","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "22","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "24","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "25","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "26","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "29","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "40","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "41","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}},{ "_index": "karl20-2018.01.10","_type": "test","_id": "44","_score": 1,"_source": { "dotted-name-by-zone": "part-exchange-api.not-set.by-zone.not-implemented-yet.jvm.threads.terminated.count","dotted-name-by-server": "part-exchange-api.not-set.by-server.mcr-al33030.jvm.threads.terminated.count","logstash-shipping-production": "false","event-type": "gauge","server-domain-name": "MCR-AL33030","server-identifier": "mcr-al33030","logstash-shipping-deployment-revision": "1029","server-ip": "not-implemented-yet","type": "metrics20-event","version": "not-set","logstash-shipping-version": "5.6.1","tags": [ "metrics20-event"],"logstash-shipping-hostname": "nonprodlogstashshipping8.node.dc2.consul","environment": "not-set","@timestamp": "2018-01-11T10:42:47.000Z","application": "part-exchange-api","@version": "1","host": "172.28.224.135","name": "jvm.threads.terminated.count","dotted-name-by-all": "part-exchange-api.not-set.by-all.jvm.threads.terminated.count","value": 0}}]
}
}
DaveCTurner commented 6 years ago

@Stono could you include the queries you're using as well as the results?

cbuescher commented 6 years ago

@Stono @DaveCTurner I don't suspect this to be the same issue, I think you only didn't set the "size" parameter. Your result contains 10 hits which is the default fetch size, total.hits should give you an idea about all the documents that matched. I don't think we should discuss this in this issue. If you have questions, would you please open a question in out Discourse forum? I'd prefer to keep this issue concentrated on the original issue which is complicated enough as it stands. Thanks.

cbuescher commented 6 years ago

@serj-p thanks for reporting this, can you check if the following things also apply to make sure this is a similar issue:

If this is the case I'd like to collect similarities/differences with the issue reported by @JagathJayasinghe to try and pin the cause down a bit more. It would also be great if you'd go over some of the findings we made above and try them and see if this helps in your case (e.g. individual node restarts etc...)

Stono commented 6 years ago

@cbuescher : the index only contains 10 documents as that's all I indexed in test data. @DaveCTurner the query was the same in both metricstest/_search?pretty=true&q=*:*

cbuescher commented 6 years ago

@stono yes but thats a match-all query, your hits all have different ids. Thats different from the issue discussed here. If you want to discuss this please open a forum thread or another issue with the data to reproduce this if you suspect a bug.

Stono commented 6 years ago

@cbuescher ahhh I see, thank you :-)

dannymurphygfk commented 6 years ago

Hi @cbuescher We did notice at one point that once the cluster was in this invalid state we were able to create a new index on our cluster and even before indexing anything we were getting bogus totals back. https://github.com/elastic/elasticsearch/issues/25603#issuecomment-332237106

So to me the issue @Stono is seeing does look somewhat related to me ? He is saying he only has 10 documents in the index ? so even a match all query should only return hits.total of 10 ? yet its returning 50 ?

cbuescher commented 6 years ago

@dannymurphygfk I rember that issue with the empty cluster, however from the brief example @Stono described I suspect there is something else going on at indexing time, thats why I suggested discussing it on a different issue/threat to not make this issue longer than it already is without need. If it turns out it is related I would link back to this issue.

serj-p commented 6 years ago

@cbuescher

[ids] query gives different total.hits than if you filter for the id with e.g. a terms query

we have id field indexed separately. query {"query":{"terms":{"id":["599c3d28f9f00fdc447e763c"]}}} gives 1 hit and 908,928,903... other random number ~900 total. ids query behaves in the same random way. it seems like counts are broken not only for ids query, but here it's the most obvious. right {"query":{"match":{"name":"Marketing"}}} gives from 7 to 50 total, I am sure this value can't change that frequently.

does this happen for you always or just after a certain time of running the nodes in the cluster?

i can't say as i cannot restart nodes in production in order to see when it appears again. I don't see this issue in test environment with 2 search server nodes(32 shards).

after this happened for the first time, does it reproduce for certain fixed ids?

All requests I've checked give close to random total value.

UPDATE

when I specify index name and type in request, query {"query":{"match":{"name":"Marketing"}}} gives persistent result which seems to be right. in the same conditions ids and terms of ids queries give random results

cbuescher commented 6 years ago

query {"query":{"terms":{"id":["599c3d28f9f00fdc447e763c"]}}}

Can you post the complete search request, including the endpoint URL (with index, types etc...) you are using please, together with a "good" and a "bad" result. Are you querying across multiple indices or types? If so how many do you have, which ones do you expect the request to hit?

Can you do the same for a good and bad "ids" query result please? The above terms query isn't really

Does specifying "size: 0" in the requests change anything about the erroneous total.hits like it does for @dannymurphygfk in https://github.com/elastic/elasticsearch/issues/25603#issuecomment-333148138?

serj-p commented 6 years ago

@cbuescher

http://search.prod.com:9200/contacts_v3.3/contact/_search -d '{"query":{"terms":{"id": "599c3d28f9f00fdc447e763c"]}}}'

results in

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 32,
    "successful": 32,
    "failed": 0
  },
  "hits": {
    "total": 903,
    "max_score": 13.536535,
    "hits": [
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "599c3d28f9f00fdc447e763c",
        "_score": 13.536535,
        "_source": {
          "updated": "2017-08-22T14:52:57+0000",
          "name": "Marketing Manager",
          "created": "2017-08-22T14:18:16+0000",
          "company_id": "50a8822bb544310ba33694ee",
          "is_account": true,
          "type": "contact",
          "id": "599c3d28f9f00fdc447e763c",
          "owner_id": "576670db2007d05cd450e067"
        }
      }
    ]
  }
}
http://search.prod.com:9200/contacts_v3.3/contact/_search -d '{"query":{"ids":{"values":["599c3d28f9f00fdc447e763c"]}}}'

in

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 32,
    "successful": 32,
    "failed": 0
  },
  "hits": {
    "total": 928,
    "max_score": 7.933596,
    "hits": [
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "599c3d28f9f00fdc447e763c",
        "_score": 1.0,
        "_source": {
          "updated": "2017-08-22T14:52:57+0000",
          "name": "Marketing Manager",
          "created": "2017-08-22T14:18:16+0000",
          "company_id": "50a8822bb544310ba33694ee",
          "is_account": true,
          "type": "contact",
          "id": "599c3d28f9f00fdc447e763c",
          "owner_id": "576670db2007d05cd450e067"
        }
      }
    ]
  }
}

it seems it doesn't metter whether i provide index/type or not

http://search.prod.com:9200/contacts_v3.3/contact/_search -d '
{
  "query": {
    "term": {
      "last name.lc": "safaa"
    }
  },
  "stored_fields": [],
  "size": "100"
}'
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 32,
    "successful": 32,
    "failed": 0
  },
  "hits": {
    "total": 814,
    "max_score": 13.097227,
    "hits": [
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "52d71f0178d200148c2c4a53",
        "_score": 13.097227
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "59f16f7a950fa207340bc193",
        "_score": 13.096043
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "50c357d40f4bd72b9b003dbe",
        "_score": 13.089661
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "56418b0b000efa7efa9961cd",
        "_score": 13.087736
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "52a9b02578d2001e40fa86ef",
        "_score": 13.086037
      }
    ]
  }
}
http://search.prod.com:9200/_search -d '
{
  "query": {
    "term": {
      "last name.lc": "safaa"
    }
  },
  "stored_fields": [],
  "size": "100"
}'
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 32,
    "successful": 32,
    "failed": 0
  },
  "hits": {
    "total": 814,
    "max_score": 13.097229,
    "hits": [
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "52d71f0178d200148c2c4a53",
        "_score": 13.097229
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "59f16f7a950fa207340bc193",
        "_score": 13.096043
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "50c357d40f4bd72b9b003dbe",
        "_score": 13.089661
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "56418b0b000efa7efa9961cd",
        "_score": 13.087733
      },
      {
        "_index": "contacts_v3.3",
        "_type": "contact",
        "_id": "52a9b02578d2001e40fa86ef",
        "_score": 13.086031
      }
    ]
  }
}

reducing size to 0 does fix the issue:

{
  "query": {
    "term": {
      "last name.lc": "safaa"
    }
  },
  "stored_fields": [ ],
  "size": "0"
}
{"took":2,"timed_out":false,"_shards":{"total":32,"successful":32,"failed":0},"hits":{"total":5,"max_score":0.0,"hits":[]}}

same for ids query

{"query":{"ids":{"values":["599c3d28f9f00fdc447e763c"]}},"size": 0}
{"took":7,"timed_out":false,"_shards":{"total":32,"successful":32,"failed":0},"hits":{"total":1,"max_score":0.0,"hits":[]}}
dannymurphygfk commented 6 years ago

Any progress on this ? We don't have any new information but we still have this issue with showing incorrect totals, unless we do a rolling restart every time.

cbuescher commented 6 years ago

@dannymurphygfk sorry, no progress on this, but the issue is definitely still on the radar Adding @elastic/es-search-aggs for wider audience

colings86 commented 6 years ago

@dannymurphygfk I've taken a look through this issue and one thing that stands out to me that we need to eliminate as a cause is We use custom Native scripts for filtering and sorting. especially since the cluster that does not use the plugin containing your native scripts does not show this problem.

  1. Does you Native Script plugin only install scripts that are called as part of your searches or do it also customise other elements of Elasticsearch? (I'd be interested in anything else it does but especially if it adds a custom fetch phase)
  2. You have mentioned a few times that you are simplifying the queries you are posting. Do the queries you are running when you are reproducing the problem use any native scripts or anything form this plugin?
  3. How do you use the native scripts? Can you provide an example request that includes all the uses of your scripts?
  4. Since you are able to reproduce this problem on your Dev cluster, could you please disable the plugin and see if the problem reproduces?
dannymurphygfk commented 6 years ago

@colings86

  1. The plugin just installs scripts that we use as part of searches... so derive from AbstractDoubleSearchScript or AbstractSearchScript.
  2. We can reproduce the problem with just the simple Ids query using no scripts.
  3. How we use the scripts

We have 5 scripts in total. 2 are derived from AbstractDoubleSearchScript and we use for sorting and aggregations. 3 derived from AbstractSearchScript used for filtering

Our documents contains a summary of each of the nested docs data as an array of CSVs

NestedMeta:["nestedDoc1,,,,,,,,", "nestedDoc2,,,,,,,,,", ",,,,,,,,,", ",,,,,,,,,", ",,,,,,,,,", ",,,,,,,,,", ",,,,,,,,,", ",,,,,,,,,"]

The sorting scripts and 2 of the filtering scripts use this data to generate a value depending on the various parameters we pass it.

"sort": [{
        "_script": {
            "type": "number",
            "script": {
                "inline": "sortScript",
                "params": {
                    "param1": 99999999,
                    "param2": false,
                    "param3": "110",
                    "param4": {
                        "param4_1": 636575194711828737
                    }
                    ...
                },
                "lang": "native"
            },
            "order": "asc"
        }
    }
],
"query": {
    "bool": {
        "must": [
        ...
        ],
        "filter": [{
                "script": {
                    "script": {
                        "inline": "filterScript",
                        "lang": "native",
                        "params": {
                            "param1": 99999999,
                            "param2": false,
                            "param3": "110",
                            "param4": {
                                "param4_1": 636575194711828737
                            }
                            ...
                        }
                    }
                }
            }
        ]
    }
},
"aggs": {
    "scriptGeneratedAggs": {
        "stats": {
            "script": {
                "inline": "sortScript",
                "params": {
                    "param1": 99999999,
                    "param2": false,
                    "param3": "110",
                    "param4": {
                        "param4_1": 636575194711828737
                    }
                    ...
                },
                "lang": "native"
            }
        }
    }
}

The last filtering script operates on the nested documents and uses a similar approach, with a field on each nested document with summary data on its sibling nested documents.

  1. Unfortunately we no longer have the dev cluster that reproduced the issue. If we notice the issue on our new dev cluster I will try disabling the native scripts then.
aterreno commented 6 years ago

Hi, just wanted to add a note, this issue is definitely still happening, my report is:

No idea (and no time right now to properly figure this out) but the smell are the deletions and the 'complex' docs. A delete doesn't seem to propagate properly.

Bear in mind, I've also tested getting all the docs out, and deleted docs are coming out (I was pulling just the _ids)

It's pretty crazy and to be quite frank, I am glad ES is not our primary store because something like this should never happen on a production-ready database.

Update 06/06/18 : next time I won't jump into conclusion, it was our misuse of ES

jpountz commented 6 years ago

@aterreno Thanks for the additional information. Could you also share the version you are using and whether you have nested fields in your mappings and what plugins are installed?

the counts are wrong on REST/node.js API but right on Kibana

This one is intriguing since Kibana is supposed to consume the REST API. Also when you say REST, do you mean that you are using curl to query Elasticsearch and check total hit counts? When you look at the query that Kibana sends to Elasticsearch, is there any difference with the ones that you are sending via curl?

the counts are ok when you search specifying a field equals to something

So what is the query that reproduces the issue? Is it an ids query like @JagathJayasing or can you also reproduce with other queries?

Bear in mind, I've also tested getting all the docs out, and deleted docs are coming out

How did you do pull docs out? Using a match_all query? With scroll?

It's pretty crazy and to be quite frank, I am glad ES is not our primary store because something like this should never happen on a production-ready database.

Agreed. It is a very embarassing bug.

s1monw commented 6 years ago

@aterreno can you provide the way you query elasticsearch:

it would be very very helpful if you could provide a sample request and the response.

aterreno commented 6 years ago

Thanks for your prompt replies @s1monw & @jpountz, I've wrote a bash script to reproduce:

ES_CLUSTER=
INDEX=

curl -s -k "$ES_CLUSTER/$INDEX/_stats" | jq . | grep -A3 $INDEX | grep count 
#1597 (same as node api)

curl -s -k "$ES_CLUSTER/_cat/count/$INDEX" | awk '{$1=$2=""; print $0}'
#1597 (same as node api)

curl -s -k "$ES_CLUSTER/$INDEX/_count?q=email:*&preference=_primary&ignore_unavailable" | jq .count
#1587 (one more than kibana)

curl -s -k "$ES_CLUSTER/$INDEX/_count?q=email:*" | jq .count
#1587 (one more than kibana)

curl -s -k "$ES_CLUSTER/$INDEX/_search?q=email:*" | jq .hits.total 
#1587 (one more than kibana)

curl -k -s "$ES_CLUSTER/_msearch" -H 'Content-Type: application/json' -d'
{"index":["'"$INDEX"'"]}
{"version":true,"size":2000,"_source":{"excludes":[]},"query":{"bool":{"must":[{"match_all":{}}],"filter":[],"should":[],"must_not":[]}}}
' | jq .responses | jq .[0].hits.total
#1597 (same as node api)

curl -k -s "$ES_CLUSTER/_msearch" -H 'Content-Type: application/json' -d'
{"index":["'"$INDEX"'"],"ignore_unavailable":true,"preference":1528210810366}
{"version":true,"size":500,"sort":[{"updatedAt":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"updatedAt","interval":"1M","time_zone":"Europe/London","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["cads.created","cads.mfgTime","createdAt","emails.createdAt","notes.createdAt","postDate","switches.createdAt","switches.updatedAt","updatedAt","userIssues.createdAt","userIssues.resolvedAt"],"query":{"bool":{"must":[{"match_all":{}},{"range":{"updatedAt":{"gte":1370444411553,"lte":1528210811554,"format":"epoch_millis"}}}],"filter":[],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}}
' | jq .responses | jq .[0].hits.total
#1564

I am not an ES expert so I might be missing something, but I just don't understand how these 'similar' queries return different counts on the same index.

To reply to @s1monw specifally:

aterreno commented 6 years ago

On Tue, Jun 5, 2018 at 6:21 PM Adrien Grand notifications@github.com wrote:

@aterreno https://github.com/aterreno Thanks for the additional information. Could you also share the version you are using and whether you have nested fields in your mappings and what plugins are installed?

the counts are wrong on REST/node.js API but right on Kibana

This one is intriguing since Kibana is supposed to consume the REST API. Also when you say REST, do you mean that you are using curl to query Elasticsearch and check total hit counts? When you look at the query that Kibana sends to Elasticsearch, is there any difference with the ones that you are sending via curl?

Yes, I hope that the BASH script would help explaining the issue

the counts are ok when you search specifying a field equals to something

So what is the query that reproduces the issue? Is it an ids query like @jagathjayasing or can you also reproduce with other queries?

Bear in mind, I've also tested getting all the docs out, and deleted docs are coming out

How did you do pull docs out? Using a match_all query? With scroll?

A simple search was returning also delted docs, to add more info, we deleted the index, worked on it for a day or so and deleted about 10 docs, a plain match all was returning them.

It's pretty crazy and to be quite frank, I am glad ES is not our primary store because something like this should never happen on a production-ready database.

Agreed. It is a very embarassing bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/25603#issuecomment-394791480, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEpCm6Dar5xcYtapOYwRJIsr_aNJZQ6ks5t5r4dgaJpZM4ORM4Y .

aterreno commented 6 years ago

Just to add some more details... I am pasting here the json schema... I hope it helps to understand the problem or reproducing it. It's indeed a farily complex (perhaps overcompliated, something we need to work on) document structure.

{
  "$id": "http://example.com/example.json", 
  "type": "object", 
  "definitions": {}, 
  "$schema": "http://json-schema.org/draft-07/schema#", 
  "properties": {
    "_index": {
      "$id": "/properties/_index", 
      "type": "string"
    }, 
    "_type": {
      "$id": "/properties/_type", 
      "type": "string"
    }, 
    "_id": {
      "$id": "/properties/_id", 
      "type": "string"
    }, 
    "_score": {
      "$id": "/properties/_score", 
      "type": "integer"
    }, 
    "_source": {
      "$id": "/properties/_source", 
      "type": "object", 
      "properties": {
        "email": {
          "$id": "/properties/_source/properties/email", 
          "type": "string"
        }, 
        "channel": {
          "$id": "/properties/_source/properties/channel", 
          "type": "string"
        }, 
        "createdAt": {
          "$id": "/properties/_source/properties/createdAt", 
          "type": "string"
        }, 
        "info": {
          "$id": "/properties/_source/properties/info", 
          "type": "object", 
          "properties": {
            "addressesHistory": {
              "$id": "/properties/_source/properties/info/properties/addressesHistory", 
              "type": "array", 
              "items": {
                "$id": "/properties/_source/properties/info/properties/addressesHistory/items", 
                "type": "object", 
                "properties": {
                  "nMonths": {
                    "$id": "/properties/_source/properties/info/properties/addressesHistory/items/properties/nMonths", 
                    "type": "integer"
                  }, 
                  "address": {
                    "$id": "/properties/_source/properties/info/properties/addressesHistory/items/properties/address", 
                    "type": "string"
                  }, 
                  "postcode": {
                    "$id": "/properties/_source/properties/info/properties/addressesHistory/items/properties/postcode", 
                    "type": "string"
                  }
                }
              }
            }, 
            "bankDetails": {
              "$id": "/properties/_source/properties/info/properties/bankDetails", 
              "type": "object"
            }, 
            "contact": {
              "$id": "/properties/_source/properties/info/properties/contact", 
              "type": "object", 
              "properties": {
                "title": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/title", 
                  "type": "string"
                }, 
                "firstname": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/firstname", 
                  "type": "string"
                }, 
                "lastname": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/lastname", 
                  "type": "string"
                }, 
                "email": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/email", 
                  "type": "string"
                }, 
                "phone": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/phone", 
                  "type": "string"
                }, 
                "homeAddress": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/homeAddress", 
                  "type": "object", 
                  "properties": {
                    "address": {
                      "$id": "/properties/_source/properties/info/properties/contact/properties/homeAddress/properties/address", 
                      "type": "string"
                    }, 
                    "postcode": {
                      "$id": "/properties/_source/properties/info/properties/contact/properties/homeAddress/properties/postcode", 
                      "type": "string"
                    }, 
                    "ehlAddressId": {
                      "$id": "/properties/_source/properties/info/properties/contact/properties/homeAddress/properties/ehlAddressId", 
                      "type": "string"
                    }
                  }
                }, 
                "manualAddress": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/manualAddress", 
                  "type": "object"
                }, 
                "billingAddress": {
                  "$id": "/properties/_source/properties/info/properties/contact/properties/billingAddress", 
                  "type": "object"
                }
              }
            }, 
            "loginLink": {
              "$id": "/properties/_source/properties/info/properties/loginLink", 
              "type": "string"
            }, 
            "personal": {
              "$id": "/properties/_source/properties/info/properties/personal", 
              "type": "object", 
              "properties": {
                "broadband": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/broadband", 
                  "type": "boolean"
                }, 
                "ownership": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/ownership", 
                  "type": "string"
                }, 
                "passwordHint": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/passwordHint", 
                  "type": "string"
                }, 
                "dob": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/dob", 
                  "type": "string"
                }, 
                "employmentStatus": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/employmentStatus", 
                  "type": "string"
                }, 
                "specialNeeds": {
                  "$id": "/properties/_source/properties/info/properties/personal/properties/specialNeeds", 
                  "type": "string"
                }
              }
            }
          }
        }, 
        "meters": {
          "$id": "/properties/_source/properties/meters", 
          "type": "object", 
          "properties": {
            "userMeter": {
              "$id": "/properties/_source/properties/meters/properties/userMeter", 
              "type": "object", 
              "properties": {
                "smartMeter": {
                  "$id": "/properties/_source/properties/meters/properties/userMeter/properties/smartMeter", 
                  "type": "string"
                }, 
                "installingSupplier": {
                  "$id": "/properties/_source/properties/meters/properties/userMeter/properties/installingSupplier", 
                  "type": "string"
                }, 
                "sameSupplier": {
                  "$id": "/properties/_source/properties/meters/properties/userMeter/properties/sameSupplier", 
                  "type": "string"
                }
              }
            }, 
            "serials": {
              "$id": "/properties/_source/properties/meters/properties/serials", 
              "type": "object"
            }, 
            "gbgInfo": {
              "$id": "/properties/_source/properties/meters/properties/gbgInfo", 
              "type": "object"
            }
          }
        }, 
        "preferences": {
          "$id": "/properties/_source/properties/preferences", 
          "type": "object", 
          "properties": {
            "creditCheck": {
              "$id": "/properties/_source/properties/preferences/properties/creditCheck", 
              "type": "string"
            }, 
            "customerService": {
              "$id": "/properties/_source/properties/preferences/properties/customerService", 
              "type": "string"
            }, 
            "fixedContract": {
              "$id": "/properties/_source/properties/preferences/properties/fixedContract", 
              "type": "string"
            }, 
            "fuelSource": {
              "$id": "/properties/_source/properties/preferences/properties/fuelSource", 
              "type": "string"
            }, 
            "fuelType": {
              "$id": "/properties/_source/properties/preferences/properties/fuelType", 
              "type": "string"
            }, 
            "paymentMethod": {
              "$id": "/properties/_source/properties/preferences/properties/paymentMethod", 
              "type": "string"
            }, 
            "smart": {
              "$id": "/properties/_source/properties/preferences/properties/smart", 
              "type": "string"
            }, 
            "supplier": {
              "$id": "/properties/_source/properties/preferences/properties/supplier", 
              "type": "string"
            }, 
            "switch": {
              "$id": "/properties/_source/properties/preferences/properties/switch", 
              "type": "string"
            }, 
            "cancellationFee": {
              "$id": "/properties/_source/properties/preferences/properties/cancellationFee", 
              "type": "string"
            }
          }
        }, 
        "status": {
          "$id": "/properties/_source/properties/status", 
          "type": "string"
        }, 
        "tariff": {
          "$id": "/properties/_source/properties/tariff", 
          "type": "object", 
          "properties": {
            "postcode": {
              "$id": "/properties/_source/properties/tariff/properties/postcode", 
              "type": "string"
            }, 
            "dnoRegion": {
              "$id": "/properties/_source/properties/tariff/properties/dnoRegion", 
              "type": "string"
            }, 
            "currentTariff": {
              "$id": "/properties/_source/properties/tariff/properties/currentTariff", 
              "type": "object", 
              "properties": {
                "elecPaymentMethod": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecPaymentMethod", 
                  "type": "string"
                }, 
                "elecPaymentMethodLabel": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecPaymentMethodLabel", 
                  "type": "string"
                }, 
                "elecSupplier": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecSupplier", 
                  "type": "string"
                }, 
                "elecSupplierId": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecSupplierId", 
                  "type": "string"
                }, 
                "elecTariff": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecTariff", 
                  "type": "string"
                }, 
                "elecTariffId": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/elecTariffId", 
                  "type": "string"
                }, 
                "gasPaymentMethod": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasPaymentMethod", 
                  "type": "string"
                }, 
                "gasPaymentMethodLabel": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasPaymentMethodLabel", 
                  "type": "string"
                }, 
                "gasSupplierId": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasSupplierId", 
                  "type": "string"
                }, 
                "gasSupplier": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasSupplier", 
                  "type": "string"
                }, 
                "gasTariff": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasTariff", 
                  "type": "string"
                }, 
                "gasTariffId": {
                  "$id": "/properties/_source/properties/tariff/properties/currentTariff/properties/gasTariffId", 
                  "type": "string"
                }
              }
            }
          }
        }, 
        "updatedAt": {
          "$id": "/properties/_source/properties/updatedAt", 
          "type": "string"
        }, 
        "usage": {
          "$id": "/properties/_source/properties/usage", 
          "type": "object", 
          "properties": {
            "simpleEstimate": {
              "$id": "/properties/_source/properties/usage/properties/simpleEstimate", 
              "type": "object", 
              "properties": {
                "elec": {
                  "$id": "/properties/_source/properties/usage/properties/simpleEstimate/properties/elec", 
                  "type": "string"
                }, 
                "gas": {
                  "$id": "/properties/_source/properties/usage/properties/simpleEstimate/properties/gas", 
                  "type": "string"
                }
              }
            }
          }
        }, 
        "cads": {
          "$id": "/properties/_source/properties/cads", 
          "type": "object"
        }, 
        "emails": {
          "$id": "/properties/_source/properties/emails", 
          "type": "array", 
          "items": {
            "$id": "/properties/_source/properties/emails/items", 
            "type": "object", 
            "properties": {
              "email": {
                "$id": "/properties/_source/properties/emails/items/properties/email", 
                "type": "string"
              }, 
              "createdAt": {
                "$id": "/properties/_source/properties/emails/items/properties/createdAt", 
                "type": "string"
              }, 
              "type": {
                "$id": "/properties/_source/properties/emails/items/properties/type", 
                "type": "string"
              }
            }
          }
        }, 
        "notes": {
          "$id": "/properties/_source/properties/notes", 
          "type": "array"
        }, 
        "switches": {
          "$id": "/properties/_source/properties/switches", 
          "type": "array"
        }, 
        "userIssues": {
          "$id": "/properties/_source/properties/userIssues", 
          "type": "array", 
          "items": {
            "$id": "/properties/_source/properties/userIssues/items", 
            "type": "object", 
            "properties": {
              "category": {
                "$id": "/properties/_source/properties/userIssues/items/properties/category", 
                "type": "string"
              }, 
              "createdAt": {
                "$id": "/properties/_source/properties/userIssues/items/properties/createdAt", 
                "type": "string"
              }, 
              "issue": {
                "$id": "/properties/_source/properties/userIssues/items/properties/issue", 
                "type": "string"
              }, 
              "userIssueId": {
                "$id": "/properties/_source/properties/userIssues/items/properties/userIssueId", 
                "type": "string"
              }, 
              "email": {
                "$id": "/properties/_source/properties/userIssues/items/properties/email", 
                "type": "string"
              }, 
              "subcategory": {
                "$id": "/properties/_source/properties/userIssues/items/properties/subcategory", 
                "type": "string"
              }, 
              "filters": {
                "$id": "/properties/_source/properties/userIssues/items/properties/filters", 
                "type": "object", 
                "properties": {
                  "status": {
                    "$id": "/properties/_source/properties/userIssues/items/properties/filters/properties/status", 
                    "type": "string"
                  }, 
                  "category": {
                    "$id": "/properties/_source/properties/userIssues/items/properties/filters/properties/category", 
                    "type": "string"
                  }, 
                  "subcategory": {
                    "$id": "/properties/_source/properties/userIssues/items/properties/filters/properties/subcategory", 
                    "type": "string"
                  }
                }
              }
            }
          }
        }, 
        "statuses": {
          "$id": "/properties/_source/properties/statuses", 
          "type": "object", 
          "properties": {
            "account": {
              "$id": "/properties/_source/properties/statuses/properties/account", 
              "type": "string"
            }, 
            "meter": {
              "$id": "/properties/_source/properties/statuses/properties/meter", 
              "type": "string"
            }, 
            "smartMeter": {
              "$id": "/properties/_source/properties/statuses/properties/smartMeter", 
              "type": "string"
            }, 
            "issues": {
              "$id": "/properties/_source/properties/statuses/properties/issues", 
              "type": "boolean"
            }, 
            "switches": {
              "$id": "/properties/_source/properties/statuses/properties/switches", 
              "type": "boolean"
            }
          }
        }
      }
    }, 
    "fields": {
      "$id": "/properties/fields", 
      "type": "object", 
      "properties": {
        "createdAt": {
          "$id": "/properties/fields/properties/createdAt", 
          "type": "array", 
          "items": {
            "$id": "/properties/fields/properties/createdAt/items", 
            "type": "string"
          }
        }, 
        "userIssues.createdAt": {
          "$id": "/properties/fields/properties/userIssues.createdAt", 
          "type": "array", 
          "items": {
            "$id": "/properties/fields/properties/userIssues.createdAt/items", 
            "type": "string"
          }
        }, 
        "emails.createdAt": {
          "$id": "/properties/fields/properties/emails.createdAt", 
          "type": "array", 
          "items": {
            "$id": "/properties/fields/properties/emails.createdAt/items", 
            "type": "string"
          }
        }, 
        "updatedAt": {
          "$id": "/properties/fields/properties/updatedAt", 
          "type": "array", 
          "items": {
            "$id": "/properties/fields/properties/updatedAt/items", 
            "type": "string"
          }
        }
      }
    }
  }
}
jpountz commented 6 years ago

Thanks for the details, I suspect things are fine in your case actually.

I just don't understand how these 'similar' queries return different counts on the same index

They don't all do the same thing. For intance the query email:* will only match email addresses that have at least on token for the analyzer on this field. For instance if the value of this field is an empty string or a string that only contain characters that the analyzer splits on, then the document won't match.

The last query has a required range query on the updatedAt which probably excludes some documents as well.

aterreno commented 6 years ago

But email is the _id, a document won't be inserted without email and I've exported all docs, mapped only the email, and no dupes or empty... That was very weird.

The updatedAt is more interesting as, it's not a mandatory field, the way we fixed our 'node api' was to filter on createdAt exists as that field is definitely always there.

On Wed, Jun 6, 2018 at 10:20 AM Adrien Grand notifications@github.com wrote:

Thanks for the details, I suspect things are fine in your case actually.

I just don't understand how these 'similar' queries return different counts on the same index

They don't all do the same thing. For intance the query email:* will only match email addresses that have at least on token for the analyzer on this field. For instance if the value of this field is an empty string or a string that only contain characters that the analyzer splits on, then the document won't match.

The last query has a required range query on the updatedAt which probably excludes some documents as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/25603#issuecomment-395003373, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEpCiDXwMrtupFsmDQVN04AmUSxyFLfks5t5573gaJpZM4ORM4Y .

s1monw commented 6 years ago

Indeed this index looks healthy, I can't see any issues with these numbers. The email question looks indeed like a tokenization issue.

A simple search was returning also delted docs, to add more info, we deleted the index, worked on it for a day or so and deleted about 10 docs, a plain match all was returning them.

was this index refreshed after deleting the docs? I mean ES doesn't refresh it's point in time view once the delete returns. it might do depending on you refresh interval. I am not sure what you are seeing is in any way related to this issue. Also these are all basic usecases and there wasn't a single report like this in the past. Nevertheless, lets find the root cause of your issues...

aterreno commented 6 years ago

Thanks @s1monw , I don't want to waste anyone time, it could be we are using ES the wrong way, I forgot to mention that I've also tried to refresh/flush/clear cache before running those curls

ES_CLUSTER=
INDEX=

curl -XPOST -k '$ES_CLUSTER/$INDEX/_cache/clear'
curl -XPOST -k '$ES_CLUSTER/$INDEX/_flush'     
curl -XPOST -k '$ES_CLUSTER/$INDEX/_refresh'

Would that be sufficient?

I can try to go into further details with the sequence of events that happen on that index, but that will take me some time...

jpountz commented 6 years ago

You should be able to look at the problematic documents by running a query like that:

GET $INDEX/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "must_not": { "query_string": { "query": "email:*" } }
    }
  }
}

Their email field will likely be only made of characters that your analyzer splits on.

aterreno commented 6 years ago

Bingo, that returns the deleted users, plus one document which has a wrong schema.

ES_CLUSTER=
INDEX=

curl -s -k $ES_CLUSTER/$INDEX/_search -H 'Content-Type: application/json'  -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "must_not": { "query_string": { "query": "email:*" } }
    }
  }
}' | jq .hits.hits |  jq 'map(._id)'

I don't understand why though, the doc with the wrong schema is 'on us', that needs to go but the other ones really should be the same as the other ones.

Now, if I try to delete those again, from curl I get:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [$INDEX] as the final mapping would have more than 1 type: [_doc, user]"}],"type":"illegal_argument_exception","reason":"Rejecting mapping update to [$INDEX] as the final mapping would have more than 1 type: [_doc, user]"},"status":400}%                                                                                                      

What does it mean? And how then I can cleanup this index by properly remove this 'dirty data' from it?

Thank you so much!

s1monw commented 6 years ago

Now, if I try to delete those again, from curl I get:

can you share the curl command?

s1monw commented 6 years ago

It's pretty crazy and to be quite frank, I am glad ES is not our primary store because something like this should never happen on a production-ready database.

@aterreno I do wonder if you, give that we figured out what's going on and it's not nearly as terrible as we thought it is, want to reiterate on this statement?

aterreno commented 6 years ago

@s1monw the curl is the default, as per docs: curl -X DELETE "localhost:9200/twitter/_doc/1"

anything wrong with that?

Answering the other question, I might have been harsh, but I am still quite happy that it's not my primary data store TBH, I've used tons of different SQL and noSQL solutions in my career and I don't understand why ES is not as straightforward, maybe something missing in the docs?

If I have a document in a database, no matter what the token analyser is, shouldn't I be able to just delete it? And if deleted it should be gone for good.

If the curl above is right, I can finally see the root cause of the issue, assuming that http request is what is behind the scenes from the ES node.js client, the lambdas are failing at deleting those docs.

Sorry, I might need to read an ES Bible before pontificating here but it's still not a behaviour that I'd expect from a data store.

Or we are using it completely wrong and we shouldn't tokenise the _id, is that the biggest mistake we could possibly do with ES?

jpountz commented 6 years ago

Given the error message, you probably need to run curl -X DELETE "localhost:9200/twitter/user/1" instead.

s1monw commented 6 years ago

Sorry, I might need to read an ES Bible before pontificating here but it's still not a behaviour that I'd expect from a data store.

please keep in mind elasticsearch is a search engine and you are indexing documents that get tokenized. Everything worked as expected and all the data is available. I do understand that coming from a datastore that is not a search engine is difficult and needs some training and reading. I want to understand why using a different technology that you haven't got enough experience and causing you trouble warrant such a statement. We take issues like this very serious, we jumped on it and resolved it in literally no time. All issues we found were on your end. I would have really appreciated you correcting your statement. If you are happy you using a different technology because you are more familiar with it, I am more than fine with that. I think you should live up to your comment and revert it, there are many people working very hard to build a good product, they don't deserve statements like this.

Or we are using it completely wrong and we shouldn't tokenise the _id, is that the biggest mistake we could possibly do with ES?

yes if you send ES text it will tokenize it by default as I would expect from a search engine. If your email field is the ID you should make it a keyword field.

aterreno commented 6 years ago

Thanks @jpountz that fixed the docs that were still 'hanging', I'll check the node code (which I didn't write) to do the delete to understand why they didn't get deleted in the first place. I'll also keep an eye on eventual docs that don't get deleted, until numbers are low I can diff the docs and figure out why the models are 'upsetting' ES.
We might protect ourselves further by using some sort of schema validation before inserting into ES to avoid this sort of issues. I appreciate a lot the support guys, thanks again.

aterreno commented 6 years ago

@s1monw I've updated my comment, and I apologise for the comment, I've appreciated a lot the support.

Given that there's no BUG on ES side I'll do the necessary reading to figure out how to both using the uax_url_email tokenizer (as we are doing) but also using email as keyword for _id.

It's still not 100% clear (but I'll figure it out myself, don't want to waste any further time of your / your team time) why those docs didn't get deleted in the first place as the others.

s1monw commented 6 years ago

thanks I really appreciate that @aterreno

aterreno commented 6 years ago

no worries @s1monw keep the good work :-) I can prove I (still) love ES (https://medium.com/@javame/visualise-all-things-82adc32bcf64 posted just last night ;)

murfee25 commented 5 years ago

I don't suppose there was any progress on this issue ? Its rearing its head for us again.

aterreno commented 5 years ago

Technically is not an issue, and it should probably be closed, Simon explained very well the 'odds' of ES, this is the way it supposed to work, as he said (and I'll never forget that lesson) it's a search engine not a database ;)

On Fri, Feb 1, 2019 at 11:46 AM murfee25 notifications@github.com wrote:

I don't suppose there was any progress on this issue ? Its rearing its head for us again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/25603#issuecomment-459697194, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEpCnTeDiSOx_9vvsWbNeZVxMICVNMjks5vJCkmgaJpZM4ORM4Y .

murfee25 commented 5 years ago

I think your issue was different to the original issue... We are querying the search engine for a specific id and it finds it for us... but the totals it says for the query are wildly out. e.g. instead of totals :1 we get totals: 319 ?

cbuescher commented 5 years ago

Its rearing its head for us again.

Sorry, I cannot find your user handle on this issue before. Do you mean you are with one of the parties involved with this issue so for or are you seeing some similar behaviour? In the later case it would probably be great to open a new issue and link this one so we can first check if and how they are related before mixing this already quite long thread with another case?

If you're refering to the same systems that @dannymurphygfk and @JagathJayasinghe were facing this issue then could you probably start by recapping if you were having trouble in the meantime, which ES versions you are running now or did everything stay the same?

dannymurphygfk commented 5 years ago

@cbuescher, Hi sorry yes, didn't realise I was logged in with a different handle... it is rearing its head for us again.

Still Running 5.6.4 and everything else still the same.

cbuescher commented 5 years ago

@dannymurphygfk okay, sorry to hear that you are still having trouble with this, there is indeed very little for us to go with since we are unable to reproduce still. Did you manage to rule out the possibility of your custom plugin interfering like @colings86 suggested in https://github.com/elastic/elasticsearch/issues/25603#issuecomment-375631978 by disabling the scripts on your new dev cluster?