AlexIoannides / elasticsearchr

Lightweight Elasticsearch client for R.
https://alexioannides.com/2016/11/28/elasticsearchr-a-lightweight-elasticsearch-client-for-r/
54 stars 19 forks source link

Aggregation on a metric base only #26

Closed garnik-kakosyan closed 6 years ago

garnik-kakosyan commented 7 years ago

First of all - thanks for a nice and easy tool for connecting R and ES. I am trying to use it on a regular basis but face several issues which I hope can be resolved. the biggest one is aggregation on a whole index. Lets say I want see total average distance. If I do it by both bucket and metric aggregation - it works. But the only logically possible bucket is _type (because other fields are not unique) and even this field might be not unique.

One potential work around might be using filter bucket instead of term. However, if I apply following code, I receive an error: Error: is.data.frame(x) is not TRUE

> dist_avg <- aggs('{
+     "2": {
+       "filters": {
+                  "filters": {
+                  "*": {
+                  "query_string": {
+                  "query": "*",
+                  "analyze_wildcard": true
+                  }
+                  }
+                  }
+                  },
+                  "aggs": {
+                  "1": {
+                  "avg": {
+                  "field": "distance"
+                  }
+                  }
+                  }
+                  }
+ }')
> temp2 <- elastic("http://localhost:9200", "flight*") %search% (dist_avg)

If I fully ignore bucket aggregation and use only metric aggregation (taken from request from Kibana chart), I receive this error: Error in extract_aggs_results(response) : no aggs results returned

> dist_avg <- aggs('{
+     "1": {
+         "avg": {
+             "field": "distance"
+         }
+     }
+ }')
> temp2 <- elastic("http://localhost:9200", "flight*") %search% (dist_avg)

I have used other query + aggs combinations but still was not successful. I am curious whether it is normal that after queering ES from R all shards fail pretty often. (working on a local machine). Several other errors provided below. Would be great if you could help me with this problem.

Error in check_http_code_throw_error(response) : 
  Elasticsearch returned a status code of 400 
{
    "error": {
        "root_cause": [
            {
                "type": "unknown_named_object_exception",
                "reason": "Unknown BaseAggregationBuilder [1]",
                "line": 4,
                "col": 23
            }
        ],
        "type": "unknown_named_object_exception",
        "reason": "Unknown BaseAggregationBuilder [1]",
        "line": 4,
        "col": 23
    },
    "status": 400
}
Error in check_http_code_throw_error(response) : 
  Elasticsearch returned a status code of 500 
{
    "error": {
        "root_cause": [
            {
                "type": "aggregation_execution_exception",
                "reason": "Invalid term-aggregator order path [_type]. Unknown aggregation [_type]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "flights",
                "node": "tMtN0Bg3RVem5RGatifbuw",
                "reason": {
                    "type": "aggregation_execution_exception",
                    "reason": "Invalid term-aggregator order path [_type]. Unknown aggregation [_type]"
                }
            }
        ]
    },
    "status": 500
}
AlexIoannides commented 7 years ago

Hi Garnik,

My apologies for the tardy reply, but I'm having a hectic year thus far.

I can reproduce the error on my side and will take a look at it in the not-too-distant future.

Many thanks,

Alex

garnik-kakosyan commented 7 years ago

Hi Alex, thanks for your reply despite such a long time. It would be nice if you could resolve this issue.I understand that sometimes we need time for ourselves - you don't have to apologize:) Meanwhile, we could probably talk more frequently on the ELK/R development since both of us are interested in these solutions.I have created a draft of an Exploratory Dashboard in R with the help of your plugin and also heavily use ES queries and Kibana visualizations. If you would like to be in touch - here is my LinkedIn (I don't spend much time there, but reply to every message)  Sincerely,Garnik 19.09.2017, 20:24, "Alex Ioannides" notifications@github.com:Hi Garnik,My apologies for the tardy reply, but I'm having a hectic year thus far.I can reproduce the error on my side and will take a look at it in the not-too-distant future.Many thanks,Alex—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread. 

AlexIoannides commented 7 years ago

Thanks for being understanding - it's greatly appreciated.

After a bit of digging, I've found the issue in utils.R - I've assumed that all aggregation results are from bucket aggs (my bad).

#' @rdname extract_query_results
extract_aggs_results <- function(response) {
  df <- jsonlite::fromJSON(httr::content(response, as = 'text'))$aggregations[[1]]$buckets
  if (length(df) == 0) stop("no aggs results returned")
  jsonlite::flatten(df)
}
AlexIoannides commented 6 years ago

This has been fixed in v0.2.1 that has been submitted to CRAN.

I'm sorry this took so long.