Elasticsearch Nested Aggregations

shivam1892 commented 6 years ago

Issue Summary

I am working on creating the table from elasticsearch with nested aggregation query. Below is my example Query

{
  "index": "data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "agg_abc": {
          "terms": {
            "field": "abc.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

The above Nested aggregated query does not give all the columns in the table as the response of elasticsearch. Also “agg_abc” have more than 5 categories in it but in table it shows only for 2 categories which is also not complete. According to me the nested aggregated query is not being parsed properly as some of the columns are missing. But when I hit the above query to elastic search It gives the proper response with all categories.

Technical details:

Redash version 4.0.1
Ubuntu 16.04
Installed with docker setup guide.

arikfr commented 6 years ago

Please share an example of the results you do get for this query in Redash and the results you get when querying directly. Thanks.

shivam1892 commented 6 years ago

Hi Arik,

This is my actual query


  "index":"data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "vendor": {
          "terms": {
            "field": "vendorId.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "totalPrice"
              }
            }
          }
        }
      }
    }
  }
}
This is response from Elasticsearch 
{
    "date_aggr": {
        "buckets": [{
                "key_as_string": "2018-01-01T00:00:00.000Z",
                "key": 1514764800000,
                "doc_count": 2540,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 594,
                            "price": {
                                "value": 415973.95056152344
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 482,
                            "price": {
                                "value": 332374.34034729004
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 448,
                            "price": {
                                "value": 310616.87955093384
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 449,
                            "price": {
                                "value": 307841.93978500366
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 338,
                            "price": {
                                "value": 220755.62995529175
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 229,
                            "price": {
                                "value": 156433.39011383057
                            }
                        }
                    ]
                }
            },
            {
                "key_as_string": "2018-01-02T00:00:00.000Z",
                "key": 1514851200000,
                "doc_count": 2514,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 581,
                            "price": {
                                "value": 394284.0503387451
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 450,
                            "price": {
                                "value": 333518.30029296875
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 475,
                            "price": {
                                "value": 329492.58993911743
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 458,
                            "price": {
                                "value": 320690.6898841858
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 331,
                            "price": {
                                "value": 221389.1397857666
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 219,
                            "price": {
                                "value": 144096.33992004395
                            }
                        }
                    ]
                }
            },
            {
                "key_as_string": "2018-01-03T00:00:00.000Z",
                "key": 1514937600000,
                "doc_count": 2469,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 545,
                            "price": {
                                "value": 374848.3391456604
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 482,
                            "price": {
                                "value": 356451.5997390747
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 447,
                            "price": {
                                "value": 319834.75061416626
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 444,
                            "price": {
                                "value": 312194.3507118225
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 336,
                            "price": {
                                "value": 252321.69033050537
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 215,
                            "price": {
                                "value": 152464.62057495117
                            }
                        }
                    ]
                }
            }```

And below is the csv format data created in Redash

price,doc_count,vendor,date_aggr
156433.39011383057,2540,ONGC,2018-01-01T00:00:00.000Z
144096.33992004395,2514,ONGC,2018-01-02T00:00:00.000Z
152464.62057495117,2469,ONGC,2018-01-03T00:00:00.000Z
173854.6601638794,2437,ONGC,2018-01-04T00:00:00.000Z
159169.18005752563,2452,ONGC,2018-01-05T00:00:00.000Z
152034.30032348633,2514,ONGC,2018-01-06T00:00:00.000Z
168911.539894104,2489,ONGC,2018-01-07T00:00:00.000Z
152085.08991241455,2492,ONGC,2018-01-08T00:00:00.000Z
165859.48080062866,2517,ONGC,2018-01-09T00:00:00.000Z
164046.0099182129,2459,ONGC,2018-01-10T00:00:00.000Z
153649.18016052246,2438,ONGC,2018-01-11T00:00:00.000Z
177316.99977111816,2533,ONGC,2018-01-12T00:00:00.000Z
167918.51028442383,2512,ONGC,2018-01-13T00:00:00.000Z
140893.3701019287,2453,ONGC,2018-01-14T00:00:00.000Z
157128.23043823242,2455,ONGC,2018-01-15T00:00:00.000Z
153589.3794708252,2497,ONGC,2018-01-16T00:00:00.000Z
167240.75048065186,2392,ONGC,2018-01-17T00:00:00.000Z
152549.56973266602,2446,ONGC,2018-01-18T00:00:00.000Z
145871.3002319336,2494,ONGC,2018-01-19T00:00:00.000Z
159190.40007019043,2522,ONGC,2018-01-20T00:00:00.000Z
165743.09028625488,2391,ONGC,2018-01-21T00:00:00.000Z
172964.93984222412,2511,ONGC,2018-01-22T00:00:00.000Z
150591.2501487732,2468,ONGC,2018-01-23T00:00:00.000Z
155488.8802909851,2565,ONGC,2018-01-24T00:00:00.000Z
159831.6100616455,2513,ONGC,2018-01-25T00:00:00.000Z
145047.91046905518,2455,ONGC,2018-01-26T00:00:00.000Z

As we can see in elasticsearch response we have 6 keys [ongc, ril, bpcl, hpcl...]
But in redash column for only ONGC is created corresponding to Sum of price. 

Can you please let me know where it is getting wrong?

moghira10 commented 5 years ago

@arikfr Any update on this?

ivanli1990 commented 5 years ago

@arikfr I met this issue too. My elasticsearch query:

{
    "index": "master_bug-20190513",
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "state": "已关闭"
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "moduleId": {
            "terms": {
                "field": "moduleId.keyword"
            },
            "aggs": {
                "labels": {
                    "terms": {
                        "field": "labels.keyword",
                        "size": 100
                    },
                    "aggs": {
                        "onesId_num": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

The result from elasticsearch:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "moduleId": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "数据",
          "doc_count": 11,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "数据自身问题",
                "doc_count": 10,
                "onesId_num": {
                  "value": 10
                }
              },
              {
                "key": "未填写原因",
                "doc_count": 1,
                "onesId_num": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "多码后端",
          "doc_count": 8,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "未填写原因",
                "doc_count": 8,
                "onesId_num": {
                  "value": 8
                }
              }
            ]
          }
        }
      ]
    }
  }
}

See above result, there has 3 rows in result. But in the redash table, there just 2 rows: The row which moduleId is “数据” and labels is “数据自身问题” does not display in the result table.

lkm commented 5 years ago

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

ivanli1990 commented 5 years ago

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

It works fine. Thanks!!!

ivanli1990 commented 5 years ago

@lkm I met another issue may be caused by your pr. When I query es, redash displays 'unicode' object has no attribute 'iteritems'. The full response from redash api：

{"job": {"status": 4, "error": "'unicode' object has no attribute 'iteritems'", "id": "5ffb80b7-d139-494c-b147-4d21b628a325", "query_result_id": null, "updated_at": 0}}

And, the query:

{
    "index": "master_bug-*",
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {}
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ],
            "must_not": []
        }
    },
    "size": 0,
    "aggs": {
        "3": {
            "terms": {
                "field": "moduleId.keyword",
                "size": 10,
                "order": {
                    "1": "desc"
                }
            },
            "aggs": {
                "1": {
                    "cardinality": {
                        "field": "onesId.keyword"
                    }
                },
                "4": {
                    "filters": {
                        "filters": {
                            "state:已关闭": {
                                "query_string": {
                                    "query": "state:已关闭",
                                    "analyze_wildcard": true
                                }
                            }
                        }
                    },
                    "aggs": {
                        "1": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        },
                        "5": {
                            "terms": {
                                "field": "labels.keyword",
                                "size": 100,
                                "order": {
                                    "1": "desc"
                                }
                            },
                            "aggs": {
                                "1": {
                                    "cardinality": {
                                        "field": "onesId.keyword"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

lkm commented 5 years ago

@ivanli1990 Would you be able to provide a sample result from that query?

ivanli1990 commented 5 years ago

@ivanli1990 Would you be able to provide a sample result from that query?

hi @lkm , the resut from query in elasticsearch as below:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 64,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "3": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 28
          },
          "4": {
            "buckets": {
              "state:已关闭": {
                "1": {
                  "value": 26
                },
                "5": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "1": {
                        "value": 26
                      },
                      "key": "未填写原因",
                      "doc_count": 26
                    }
                  ]
                },
                "doc_count": 26
              }
            }
          },
          "key": "多码后端",
          "doc_count": 64
        }
      ]
    }
  }
}

truebit commented 5 years ago

Acutally the elasticsearch tabify meant to do the same work. And I found the python implementation of tabify.

I think we could port it to redash:)

lkm commented 5 years ago

Seems good, would probably be an easy job. Not the most maintained project ever (the python implementation). Would you use it as requirement or copy in the code?

getredash / redash

Elasticsearch Nested Aggregations #2789

Issue Summary

Technical details: