elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

buckets_path cannot route through nested aggregation? #29287

Open webbnh opened 6 years ago

webbnh commented 6 years ago

Elasticsearch version (bin/elasticsearch --version):

Version: 6.2.1, Build: 7299dc3/2018-02-07T19:34:26.990113Z, JVM: 1.8.0_25

Plugins installed: [] None?

JVM version (java -version):

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

OS version (uname -a if on a Unix-like system):

Darwin mynode.local 14.5.0 Darwin Kernel Version 14.5.0: Sun Jun  4 21:40:08 PDT 2017; root:xnu-2782.70.3~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior: The idea is to pick out a bunch of documents from the index which have interesting data in a few of their fields (I've removed some of the fields from the below for simplicity), organize those documents by the contents of the sourceId field, and then discard buckets which are empty or otherwise drawn from data which doesn't match.

I had a query similar to the below which worked. I then modified the document structure such that most of the interesting data moved to a nested mapping. Attempting to modify the query to match results in an error:

{
    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "class_cast_exception",
            "reason": "org.elasticsearch.search.aggregations.bucket.nested.InternalNested cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
        }
    },
    "status": 503
}

I tried several variations on the theme, and the commonality seems to be that a buckets_path cannot route through a nested aggregation.

New query:

{
    "size":  0,
    "query": {
        "nested": {
            "path": "eventData.6_2",
            "query": {
                "dis_max": {
                    "queries": [
                        { "term":  { "eventData.6_2.6_2_1_2": "093" } },
                        { "exists": { "field": "eventData.6_2.6_2_3" } },
                        { "range": { "eventData.6_2.6_2_3": { "lt": "1000" } } }
                        ]
                    }
                }
            }
        },
    "aggs": {
        "flights": {
            "terms": {
                "size":  100000,
                "field": "sourceId"
                },
            "aggs": {
                "subA": {
                    "nested": { "path": "eventData.6_2" },
                    "aggs": {
                        "TargetCount": {
                            "cardinality": {
                            "field": "eventData.6_2.6_2_1_2",
                                "precision_threshold": 10
                                }
                            },
                        "MaxCC":  { "max": { "field": "eventData.6_2.6_2_3" } },
                        "FindIt":           {
                            "bucket_selector": {
                                "buckets_path": { "foundRecs": "TargetCount" },
                                "script":       "params.foundRecs > 0"
                                }
                            }
                        }
                    }
                }
            },
        "CC":  { "max_bucket": { "buckets_path": "flights>subA>MaxCC" } }
        }
    }

Steps to reproduce:

I'm willing to go scrape this together, but first I'd like confirmation that (a) it's not a fault in my query and (b) it's not just an implementation restriction.

Thanks!

elasticmachine commented 6 years ago

Pinging @elastic/es-search-aggs

colings86 commented 6 years ago

@webbnh There is a bug here, but the bug is that we should be catching the problem at parsing time instead of when we try to run the pipeline aggregation and output a much better error.

The problem is that the request is trying to run the bucket_selector aggregation on the nested aggregation which is a single bucket aggregation and the bucket_selector agg only works on multi-bucket aggregations. I think what you intend to do is remove the entire terms bucket if the TargetCount of SubA is 0? If so you need to move the bucket selector up one level so it is a direct sub-agg to the terms aggregation and then modify the buckets_path. Something like the following:

{
  "size":0,
  "query":{
    "nested":{
      "path":"eventData.6_2",
      "query":{
        "dis_max":{
          "queries":[
            {
              "term":{
                "eventData.6_2.6_2_1_2":"093"
              }
            },
            {
              "exists":{
                "field":"eventData.6_2.6_2_3"
              }
            },
            {
              "range":{
                "eventData.6_2.6_2_3":{
                  "lt":"1000"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs":{
    "flights":{
      "terms":{
        "size":100000,
        "field":"sourceId"
      },
      "aggs":{
        "subA":{
          "nested":{
            "path":"eventData.6_2"
          },
          "aggs":{
            "TargetCount":{
              "cardinality":{
                "field":"eventData.6_2.6_2_1_2",
                "precision_threshold":10
              }
            },
            "MaxCC":{
              "max":{
                "field":"eventData.6_2.6_2_3"
              }
            }
          }
        }
      },
      "aggs": {
        "FindIt":{
          "bucket_selector":{
            "buckets_path":{
              "foundRecs":"subA>TargetCount"
            },
            "script":"params.foundRecs > 0"
          }
        }
      }
    },
    "CC":{
      "max_bucket":{
        "buckets_path":"flights>subA>MaxCC"
      }
    }
  }
}

One unrelated thing to note is that your max_bucket aggregation will also not work. Pipeline aggregations need to be inside multi-bucket aggregations and cannot live at the top level. There is a separate issue for this: https://github.com/elastic/elasticsearch/issues/14600. For now you will need to calculate the max bucket on the client side.

webbnh commented 6 years ago

@colings86, thanks for the quick reply!

Your suggestion has a duplicate aggs key under flights, but when I remove that and place FindIt in the aggs with subA, then it seems to work! Thanks!!

I ran across #14600 looking for other reports of the problem I was encountering, but with your suggested change I'm not hitting the problem reported there. (I can't tell yet whether the query is actually working properly, as I don't have enough data in the new format yet, but my corrected query is producing values and no errors...so that seems positive! ;-) )

Thanks again for your help!

colings86 commented 6 years ago

@webbnh ok, glad its working for you. I'll leave this issue open to fix the validation problem so that a more clear error is returned at parsing time.

biji-padhy commented 5 years ago

Hi Team, I am also facing similar issue. Pasting my code here.. it will a great help if someone can help me out. Thanks in advance.

"aggs": { "business": { "composite": { "sources" : [ { "competency_name": { "terms" : { "field": "busn_competency_name.keyword" } } }, { "component_name": { "terms" : { "field": "busn_component_name.keyword" } } }, { "busn_srvc_name": { "terms" : { "field": "busn_srvc_name.keyword" } } } ] }, "aggs" : { "comp" : { "filter" : { "term": { "automata_status.keyword": "Completed" } }, "aggs" : { "sum1" : { "sum": { "field" : "p_manual_exe_time" } }, "sum2" : { "sum": { "field" : "a_actual_exe_time" } }, "effort_saved": { "bucket_selector": { "buckets_path": { "var1": "sum1", "var2": "sum2" }, "script": "params.var1 - params.var2" } } } } } }}

the error I am receiving is:

{ "error": { "root_cause": [], "type": "search_phase_execution_exception", "reason": "", "phase": "fetch", "grouped": true, "failed_shards": [], "caused_by": { "type": "class_cast_exception", "reason": "org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation" } }, "status": 503 }

polyfractal commented 5 years ago

@biji-padhy Known limitation, unfortunately. See: https://github.com/elastic/elasticsearch/issues/14600

Usually you can get around this by using a filters agg instead of filter. Irritating but it's a quirk of how the framework works at the moment :(

martijnvg commented 2 years ago

The plan is still to address the problem described in the description of this issue by catching the problem at parse time and returning a meaningful error (instead of letting the class cast error happen at execution time). Just like Colin has described in his comment.

I am also facing similar issue. Pasting my code here.. it will a great help if someone can help me out. Thanks in advance.

@biji-padhy This is another issue than is described in the description of the issue. But I agree it is similar. This relates to #90076 also and once this issue has been addressed then that should fix the class cast exception that you've reported.