elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.52k stars 24.33k forks source link

Filtered aggs with terms agg and min_doc_count of 0 #99707

Open sachin-frayne opened 9 months ago

sachin-frayne commented 9 months ago

Description

I am trying to find all the buckets for an aggregation that has 0 hits in the main filter, are in the right category in my aggs filter in order to still display the facet but with 0 hits.

For example

PUT product3/_doc/1
{
  "title": "canon printer ink",
  "brand": "canon",
  "color": "black",
  "category": "printers"
}

PUT product3/_doc/2
{
  "title": "hp printer ink",
  "brand": "hp",
  "color": "blue",
  "category": "printers"
}

PUT product3/_doc/3
{
  "title": "brother printer",
  "brand": "brother",
  "color": "white",
  "category": "printers"
}

PUT product3/_doc/4
{
  "title": "apple macbook air",
  "brand": "apple",
  "color": "grey",
  "category": "computers"
}

GET product3/_search
{
  "query": {
    "match": {
      "title": "ink"
    }
  },
  "aggs": {
    "category_brands": {
      "filter": {
        "term": {
          "category": "printers"
        }
      },
      "aggs": {
        "category_brands": {
          "terms": {
            "field": "brand.keyword",
            "min_doc_count": 0
          }
        }
      }
    }
  }
}

This gives the results

...
          {
            "key": "canon",
            "doc_count": 1
          },
          {
            "key": "hp",
            "doc_count": 1
          },
          {
            "key": "apple",
            "doc_count": 0
          },
          {
            "key": "brother",
            "doc_count": 0
          }
...

Instead of what I expected;

...
          {
            "key": "canon",
            "doc_count": 1
          },
          {
            "key": "hp",
            "doc_count": 1
          },
          {
            "key": "brother",
            "doc_count": 0
          }
...

This is similar to the request here in discuss: https://discuss.elastic.co/t/how-to-get-aggregations-with-buckets-of-zero-count-only-for-the-filter-provided/238861

My workround is

GET product3/_search
{
  "query": {
    "match": {
      "title": "ink"
    }
  },
  "aggs": {
    "category_brands": {
      "global": {},
      "aggs": {
        "category_brands": {
          "filter": {
            "term": {
              "category": "printers"
            }
          },
          "aggs": {
            "category_brands": {
              "terms": {
                "field": "brand.keyword"
              }
            }
          }
        }
      }
    },
    "all_brands": {
      "terms": {
        "field": "brand.keyword",
        "min_doc_count": 0
      }
    }
  }
}

In code I can then compare lists and remove the items from all brands that don't appear in category brands.

I also attempted to see if there was a way to solve this with a pipeline aggregation, but none seems to work with arrays in the way I want.

elasticsearchmachine commented 9 months ago

Pinging @elastic/es-analytics-geo (Team:Analytics)