Open gbarba opened 4 years ago
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)
Hi, if there is an agreed solution within the team I can work on this.
Hi @polyfractal, I was making the related changes to calculate percentiles with interpolation but there are couple of questions I want to ask before. Should I change the default behavior or add a new parameter to use interpolation? Also, is it ok to use percentiles from Apache Commons Math? If It's ok to use Commons but you want to use Excel style interpolation, we need to update the Apache Commons Math version from 3.2 to 3.4(min) to use Excel style computation or we can implement our own.
Thanks for the ping, missed your prior message :)
We try to avoid including dependencies unless we really need them, especially to the core server module. So in this case I think we should just implement the interpolation method of choice as they are relatively small.
We probably don't need the fancy interpolations though, or at least not all of them (or in the first PR). I was thinking a simple linear interpolation between adjacent values if the requested percentile doesn't land squarely on a value. We can certainly add more, but I think that's the most important / most used variation (especially since this agg is dealing with a complete "population" of values, not a sample)
Should I change the default behavior or add a new parameter to use interpolation?
Hmm, let's add a new parameter, that way we can add more interpolation types in the future if we want. Perhaps interpolation: none
is the default, and then we can add interpolation: linear
, etc for new types
Is @alpparkan still working on this? Can I try it?
Hi @Rockstar5645, I just finished the implementation and I'll send a PR in a couple days.
Hello is this issue still open? I would like to work on it.
Change the behavior or add a param to make the percentiles_bucket pipeline aggregation interpolate the result instead of return the nearest value.
The current behaviour is expected as it's properly documented and it was also discussed in the PR that introduced this pipeline (https://github.com/elastic/elasticsearch/pull/13186#discussion_r38412622).
But I find it annoying that the percentils metrics aggregation interpolates the value so both are not consistent.