getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
25.99k stars 4.34k forks source link

Elasticsearch Nested Aggregations #2789

Open shivam1892 opened 6 years ago

shivam1892 commented 6 years ago

Issue Summary

I am working on creating the table from elasticsearch with nested aggregation query. Below is my example Query

{
  "index": "data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "agg_abc": {
          "terms": {
            "field": "abc.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

The above Nested aggregated query does not give all the columns in the table as the response of elasticsearch. Also “agg_abc” have more than 5 categories in it but in table it shows only for 2 categories which is also not complete. According to me the nested aggregated query is not being parsed properly as some of the columns are missing. But when I hit the above query to elastic search It gives the proper response with all categories.

Technical details:

arikfr commented 6 years ago

Please share an example of the results you do get for this query in Redash and the results you get when querying directly. Thanks.

shivam1892 commented 6 years ago

Hi Arik,

This is my actual query


  "index":"data",
  "size": 0,
  "aggs": {
    "date_aggr": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d"
      },
      "aggs": {
        "vendor": {
          "terms": {
            "field": "vendorId.keyword",
            "order": {
              "price": "desc"
            }
          },
          "aggs": {
            "price": {
              "sum": {
                "field": "totalPrice"
              }
            }
          }
        }
      }
    }
  }
}
This is response from Elasticsearch 
{
    "date_aggr": {
        "buckets": [{
                "key_as_string": "2018-01-01T00:00:00.000Z",
                "key": 1514764800000,
                "doc_count": 2540,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 594,
                            "price": {
                                "value": 415973.95056152344
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 482,
                            "price": {
                                "value": 332374.34034729004
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 448,
                            "price": {
                                "value": 310616.87955093384
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 449,
                            "price": {
                                "value": 307841.93978500366
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 338,
                            "price": {
                                "value": 220755.62995529175
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 229,
                            "price": {
                                "value": 156433.39011383057
                            }
                        }
                    ]
                }
            },
            {
                "key_as_string": "2018-01-02T00:00:00.000Z",
                "key": 1514851200000,
                "doc_count": 2514,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 581,
                            "price": {
                                "value": 394284.0503387451
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 450,
                            "price": {
                                "value": 333518.30029296875
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 475,
                            "price": {
                                "value": 329492.58993911743
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 458,
                            "price": {
                                "value": 320690.6898841858
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 331,
                            "price": {
                                "value": 221389.1397857666
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 219,
                            "price": {
                                "value": 144096.33992004395
                            }
                        }
                    ]
                }
            },
            {
                "key_as_string": "2018-01-03T00:00:00.000Z",
                "key": 1514937600000,
                "doc_count": 2469,
                "vendor": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [{
                            "key": "RIL",
                            "doc_count": 545,
                            "price": {
                                "value": 374848.3391456604
                            }
                        },
                        {
                            "key": "IOCL",
                            "doc_count": 482,
                            "price": {
                                "value": 356451.5997390747
                            }
                        },
                        {
                            "key": "BPCL",
                            "doc_count": 447,
                            "price": {
                                "value": 319834.75061416626
                            }
                        },
                        {
                            "key": "HPCL",
                            "doc_count": 444,
                            "price": {
                                "value": 312194.3507118225
                            }
                        },
                        {
                            "key": "EssarOil",
                            "doc_count": 336,
                            "price": {
                                "value": 252321.69033050537
                            }
                        },
                        {
                            "key": "ONGC",
                            "doc_count": 215,
                            "price": {
                                "value": 152464.62057495117
                            }
                        }
                    ]
                }
            }```

And below is the csv format data created in Redash

price,doc_count,vendor,date_aggr
156433.39011383057,2540,ONGC,2018-01-01T00:00:00.000Z
144096.33992004395,2514,ONGC,2018-01-02T00:00:00.000Z
152464.62057495117,2469,ONGC,2018-01-03T00:00:00.000Z
173854.6601638794,2437,ONGC,2018-01-04T00:00:00.000Z
159169.18005752563,2452,ONGC,2018-01-05T00:00:00.000Z
152034.30032348633,2514,ONGC,2018-01-06T00:00:00.000Z
168911.539894104,2489,ONGC,2018-01-07T00:00:00.000Z
152085.08991241455,2492,ONGC,2018-01-08T00:00:00.000Z
165859.48080062866,2517,ONGC,2018-01-09T00:00:00.000Z
164046.0099182129,2459,ONGC,2018-01-10T00:00:00.000Z
153649.18016052246,2438,ONGC,2018-01-11T00:00:00.000Z
177316.99977111816,2533,ONGC,2018-01-12T00:00:00.000Z
167918.51028442383,2512,ONGC,2018-01-13T00:00:00.000Z
140893.3701019287,2453,ONGC,2018-01-14T00:00:00.000Z
157128.23043823242,2455,ONGC,2018-01-15T00:00:00.000Z
153589.3794708252,2497,ONGC,2018-01-16T00:00:00.000Z
167240.75048065186,2392,ONGC,2018-01-17T00:00:00.000Z
152549.56973266602,2446,ONGC,2018-01-18T00:00:00.000Z
145871.3002319336,2494,ONGC,2018-01-19T00:00:00.000Z
159190.40007019043,2522,ONGC,2018-01-20T00:00:00.000Z
165743.09028625488,2391,ONGC,2018-01-21T00:00:00.000Z
172964.93984222412,2511,ONGC,2018-01-22T00:00:00.000Z
150591.2501487732,2468,ONGC,2018-01-23T00:00:00.000Z
155488.8802909851,2565,ONGC,2018-01-24T00:00:00.000Z
159831.6100616455,2513,ONGC,2018-01-25T00:00:00.000Z
145047.91046905518,2455,ONGC,2018-01-26T00:00:00.000Z

As we can see in elasticsearch response we have 6 keys [ongc, ril, bpcl, hpcl...]
But in redash column for only ONGC is created corresponding to Sum of price. 

Can you please let me know where it is getting wrong?
moghira10 commented 5 years ago

@arikfr Any update on this?

ivanli1990 commented 5 years ago

@arikfr I met this issue too. My elasticsearch query:

{
    "index": "master_bug-20190513",
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "state": "已关闭"
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "moduleId": {
            "terms": {
                "field": "moduleId.keyword"
            },
            "aggs": {
                "labels": {
                    "terms": {
                        "field": "labels.keyword",
                        "size": 100
                    },
                    "aggs": {
                        "onesId_num": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 0
}

The result from elasticsearch:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "moduleId": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "数据",
          "doc_count": 11,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "数据自身问题",
                "doc_count": 10,
                "onesId_num": {
                  "value": 10
                }
              },
              {
                "key": "未填写原因",
                "doc_count": 1,
                "onesId_num": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "多码后端",
          "doc_count": 8,
          "labels": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "未填写原因",
                "doc_count": 8,
                "onesId_num": {
                  "value": 8
                }
              }
            ]
          }
        }
      ]
    }
  }
}

See above result, there has 3 rows in result. But in the redash table, there just 2 rows: image The row which moduleId is “数据” and labels is “数据自身问题” does not display in the result table.

lkm commented 5 years ago

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

ivanli1990 commented 5 years ago

@ivanli1990 Your query looks fairly similar to what I'm implementing support for, you could try the PR I've just referenced

It works fine. Thanks!!!

ivanli1990 commented 5 years ago

@lkm I met another issue may be caused by your pr. When I query es, redash displays 'unicode' object has no attribute 'iteritems'. The full response from redash api:

{"job": {"status": 4, "error": "'unicode' object has no attribute 'iteritems'", "id": "5ffb80b7-d139-494c-b147-4d21b628a325", "query_result_id": null, "updated_at": 0}}

And, the query:

{
    "index": "master_bug-*",
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {}
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": 1557590400000,
                            "lte": 1558195199999,
                            "format": "epoch_millis"
                        }
                    }
                }
            ],
            "must_not": []
        }
    },
    "size": 0,
    "aggs": {
        "3": {
            "terms": {
                "field": "moduleId.keyword",
                "size": 10,
                "order": {
                    "1": "desc"
                }
            },
            "aggs": {
                "1": {
                    "cardinality": {
                        "field": "onesId.keyword"
                    }
                },
                "4": {
                    "filters": {
                        "filters": {
                            "state:已关闭": {
                                "query_string": {
                                    "query": "state:已关闭",
                                    "analyze_wildcard": true
                                }
                            }
                        }
                    },
                    "aggs": {
                        "1": {
                            "cardinality": {
                                "field": "onesId.keyword"
                            }
                        },
                        "5": {
                            "terms": {
                                "field": "labels.keyword",
                                "size": 100,
                                "order": {
                                    "1": "desc"
                                }
                            },
                            "aggs": {
                                "1": {
                                    "cardinality": {
                                        "field": "onesId.keyword"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
lkm commented 5 years ago

@ivanli1990 Would you be able to provide a sample result from that query?

ivanli1990 commented 5 years ago

@ivanli1990 Would you be able to provide a sample result from that query?

hi @lkm , the resut from query in elasticsearch as below:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 64,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "3": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 28
          },
          "4": {
            "buckets": {
              "state:已关闭": {
                "1": {
                  "value": 26
                },
                "5": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "1": {
                        "value": 26
                      },
                      "key": "未填写原因",
                      "doc_count": 26
                    }
                  ]
                },
                "doc_count": 26
              }
            }
          },
          "key": "多码后端",
          "doc_count": 64
        }
      ]
    }
  }
}
truebit commented 5 years ago

Acutally the elasticsearch tabify meant to do the same work. And I found the python implementation of tabify.

I think we could port it to redash:)

lkm commented 5 years ago

Seems good, would probably be an easy job. Not the most maintained project ever (the python implementation). Would you use it as requirement or copy in the code?