elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.67k stars 24.66k forks source link

Data mismatch in terms_stats facet #6046

Closed rbnacharya closed 10 years ago

rbnacharya commented 10 years ago

My test data and mapping are listed here:

https://gist.github.com/rbnacharya/7acc70b99f758da2162e

Apply the mapping, and add the data...

After that:

I've used a simple facet:

POST http://localhost:9200/testindex/Medical/_search
{
   "size": 0,
   "facets": {
      "totalPaidAmount:top20": {
         "terms_stats": {
            "key_field": "udf21Id",
            "value_field": "paidAmount",
            "size": 20,"order":"total"
         }
      }
   }
}

And,

POST http://localhost:9200/testindex/Medical/_search
{
   "size": 0,
   "facets": {
      "totalPaidAmount:top500": {
         "terms_stats": {
            "key_field": "udf21Id",
            "value_field": "paidAmount",
            "size": 500,"order":"total"

         }
      }
   }
}

The response is not same for some records [count and total] . As you can match results yourself.

In facet top20, There are less documents, but in facet with name containing top500, there are more document counts.

Am I doing wrong? or is it a elasticsearch bug??

jpountz commented 10 years ago

This is indeed a known limitation of the terms and terms stats facets, see https://github.com/elasticsearch/elasticsearch/issues/1305 for more information. You can improve accuracy by increasing the value of the shard_size parameter at the cost of more memory usage and network traffic.