elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.94k stars 24.74k forks source link

Allow Bucket Script aggregation to reference on string results #36642

Open larrycinnabar opened 5 years ago

larrycinnabar commented 5 years ago

Currently bucket script aggregations allows only numbers to be referenced on:

      "some-stringified-metric":{  
          "bucket_script":{  
            "buckets_path":{  
              "a":"path>of>nested>metrics>agg_that_returns_a_string.value"
            },
            "script":{  
              "source":"params.a"
            }
          }
        },

You will get: "buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.String"

Result of the agg is just a string, not a list of object, or something complex. Why plain single strings are so offended that they can not be returned as a result?

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo

polyfractal commented 5 years ago

Right now, most of the pipeline aggs are geared towards doing numeric transformations (derivatives, etc), which is where that limitation is coming from. The BucketScript agg is definitely a little different, since you have access to a script.

What sort of operation are you wanting to do with the strings?

larrycinnabar commented 5 years ago

My case is a little specific:

I have a query with dozens of aggregations. Some of them are really nested. Then, on application level we need to read elastic raw response (json) and marshal it to a struct. To simplify this thing we did a trick: For every metric that we need - we will have:

  1. real aggregation (may be really deep nested)
  2. "public" aggregation - that is top-level and just uses bucket script to direct to its real aggregation value

Why strings are needed? - Because some metrics can return not numeric result, but the value result . I provide here two examples:

  1. Simple example: the most used term
  2. Complex example. Some of our metrics - are histograms: [{key:a,count:100},{key:b,count:200}...]. It's might be a result of terms aggregation, but we use ScriptedMetric: because sometimes (a) logic is behind just a TermsAggregation, and (b) we can jsonify the result as a simple string. So, yes, I want to perform a some kind of terms aggregation, and the list of buckets it returns - are jsonified in a string, and I want it to be available as top-level aggregation (via bucket-script)
polyfractal commented 5 years ago

I see, thanks for the detailed description! I think it makes sense to open up BucketScript for any kind of return value, not just numerics, given the open-ended nature of scripts. It will still need to support gap_policy and related numeric-based features (which may not make sense for strings), but I think that's probably acceptable.

I think it would be a pretty straightforward change: modify the BucketScript painless context to return Object instead of Double, adjust the aggregator to work with objects, update the docs.

I'm going to label this team-discuss to see what the rest of the team thinks.

polyfractal commented 5 years ago

Discussed this in the team meeting, and there was no objection to bucket_script being able to return Object instead of doubles. Unfortunately, I think implementing this will probably be tied up with larger refactoring done to the pipeline framework, so it may be some time before something like this can be extended/fixed.

Hohol commented 5 years ago

I'd like to work on this.

polyfractal commented 5 years ago

:+1: Note that https://github.com/elastic/elasticsearch/pull/44179 will change BucketScript a little, and I'm working on a PR right now to add a GapPolicy.NONE, which will also affect things.

I'm not sure how easy this enhancement will be. The pipeline framework is pretty much hardcoded to expect doubles everywhere right now. We may need to chip away at refactoring first before this ticket can be addressed.

linuradu commented 4 years ago

@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.

Can you please take a look and let me know if there is any chance of doing what I need?

linuradu commented 4 years ago

@

@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.

Can you please take a look and let me know if there is any chance of doing what I need?

I resolved this issue and here is the answer: https://stackoverflow.com/questions/60662222