Open larrycinnabar opened 5 years ago
Pinging @elastic/es-analytics-geo
Right now, most of the pipeline aggs are geared towards doing numeric transformations (derivatives, etc), which is where that limitation is coming from. The BucketScript agg is definitely a little different, since you have access to a script.
What sort of operation are you wanting to do with the strings?
My case is a little specific:
I have a query with dozens of aggregations. Some of them are really nested. Then, on application level we need to read elastic raw response (json) and marshal it to a struct. To simplify this thing we did a trick: For every metric that we need - we will have:
bucket script
to direct to its real aggregation valueWhy strings are needed? - Because some metrics can return not numeric result, but the value result . I provide here two examples:
[{key:a,count:100},{key:b,count:200}...]
. It's might be a result of terms
aggregation, but we use ScriptedMetric
: because sometimes (a) logic is behind just a TermsAggregation
, and (b) we can jsonify the result as a simple string.
So, yes, I want to perform a some kind of terms
aggregation, and the list of buckets it returns - are jsonified in a string, and I want it to be available as top-level aggregation (via bucket-script)I see, thanks for the detailed description! I think it makes sense to open up BucketScript for any kind of return value, not just numerics, given the open-ended nature of scripts. It will still need to support gap_policy
and related numeric-based features (which may not make sense for strings), but I think that's probably acceptable.
I think it would be a pretty straightforward change: modify the BucketScript painless context to return Object
instead of Double
, adjust the aggregator to work with objects, update the docs.
I'm going to label this team-discuss
to see what the rest of the team thinks.
Discussed this in the team meeting, and there was no objection to bucket_script
being able to return Object
instead of doubles. Unfortunately, I think implementing this will probably be tied up with larger refactoring done to the pipeline framework, so it may be some time before something like this can be extended/fixed.
I'd like to work on this.
:+1: Note that https://github.com/elastic/elasticsearch/pull/44179 will change BucketScript a little, and I'm working on a PR right now to add a GapPolicy.NONE
, which will also affect things.
I'm not sure how easy this enhancement will be. The pipeline framework is pretty much hardcoded to expect doubles everywhere right now. We may need to chip away at refactoring first before this ticket can be addressed.
@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.
Can you please take a look and let me know if there is any chance of doing what I need?
@
@polyfractal I think I'm stuck because of this feature request; I'm trying to count all specific fields based on an aggregation result. I posted here my Issue.
Can you please take a look and let me know if there is any chance of doing what I need?
I resolved this issue and here is the answer: https://stackoverflow.com/questions/60662222
Currently bucket script aggregations allows only numbers to be referenced on:
You will get:
"buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.String"
Result of the agg is just a string, not a list of object, or something complex. Why plain single strings are so offended that they can not be returned as a result?