Closed antoinebaudoux closed 7 years ago
Adding notes to the above issue which was brought up at Elastic{ON}. Essentially, the issue is with the ordering of values in the bar chart for sub aggregations on unique count. The order should be descending by value, but due to the split, the bars are unordered by unique count.
I need to dive into the issue to debug.
So this seems to be a bug in the vislib. Just reproduced. The response from elasticsearch seems to return the results in the correct order, however, the chart displays the data out of order.
+1. I have reproduced this also.
/* sent while mobile */
From: Antoine Baudouxmailto:notifications@github.com Sent: ý11/ý03/ý2015 11:49 AM To: elastic/kibanamailto:kibana@noreply.github.com Subject: [kibana] Incorrect ordering of terms sub agg (#3314)
[screen shot 2015-03-10 at 17 45 15]https://cloud.githubusercontent.com/assets/5154448/6588348/a74418d8-c74d-11e4-8ca2-5e7283a67845.png [screen shot 2015-03-10 at 17 45 58]https://cloud.githubusercontent.com/assets/5154448/6588347/a730c67a-c74d-11e4-9d9e-933dd8a4e6eb.png
— Reply to this email directly or view it on GitHubhttps://github.com/elastic/kibana/issues/3314.
If you look at both screenshot you can see that the ordering seems to be good with the split, since it is identical to the ordering without the split. Its more the bars heights that are messed up.
@ab-taktik yes, that is what I was referring to when I titled it ordering. By default, the bars should be ordered on the x axis in descending fashion.
+1
+1
Hello, any news on this? Do you have an idea what is the root cause?
Maybe this has to do with the approximate nature of count/cardinality aggregations, and also the fact that we take only the top X terms and not all terms
@ab-taktik I think you may be right. By default Elasticsearch sends the documents in descending order by doc_count
of buckets returned. Therefore, we have been rendering bar charts with this assumption. However this is not always the case.
Take for example this dataset and this chart:
As you can see, the second set of stacked bars in this example should go first. The reason it is not returned first is because the total doc_count
is higher in the first bar, but when you subtract the sum_other_doc_count
from the doc_count
to get the value that is actually displayed, then its clear why the first set of stacked bars is smaller than the second set of stacked bars.
Best solution: Re-order the buckets returned from elasticsearch based on doc_count
- sum_other_doc_count
. I will add the appropriate time table for a fix.
@stormpython @ab-taktik this is just the way that aggregations work. Here is a hypothetical step-by-step of what's happening in elasticsearch:
scheduleFull.raw
language.raw
This process is precisely what we are visualizing in the second screenshot, and why we can't just subtract the sum_other_doc_count
.
In the outlined steps, "unique count of user.ids" can be replaced with any metric, even "99.99th percentile", and therefore the sum_other_doc_count
would not have any relevance.
@ab-taktik I think what you really want is for step 1(ii). to happen in a third phase, and for it to go more like "the sum of the 'unique count of user.ids' from the selected child buckets is calculated for each bucket" and then for 1(iii). and 1(vi). to use this new metric in order to sort and select the top 50 buckets. This functionality is something that the elasticsearch 2.0 feature bucket reducers is aiming to solve. Until it is available, I don't think this is a feature Kibana 4 will support.
Another way to think of this problem is that the buckets that create the bars are sorted based on the ordering parameters in the x-axis aggregation:
and the value used to do that sorting include documents that are excluded by the sub aggregation (grey area added to illustrate the excluded documents)
FWIW I've reproduced this issue without using unique count metrics in https://github.com/elastic/kibana/issues/3734
Reading what @spalger says, it seems to me that the ordering is actually correct. But that the problem is the Terms Sub Aggregation for Split Bars is incorrectly excluding data, creating what is unarguably a misleading representation of the data. _Sorry about the "what @spalger is saying" - it was rude and badly phrased - I've rephrased! :+1: _
I just did a graph like this with Top 5 browser across operating systems, and all of a sudden it looked like iOS was the top operating system, but it wasn't... Windows was, it just had so many variations of browser it only showed the top 5.
There should be a part of the bar, which @spalger showed in grey, to show "Other" - this would fix both the ordering (which in my opinion is correct actually) and would fix the misleading representation of data. In my case Windows would jump up with a huge "Other" area, and the iOS would still be there at the end but much much tinier.
Summary: Ordering is fine, but what's happening is "Split Bar" + "Terms" is actually doing a "Filtered Split Bar" and filtering data, taking away all meaning from the original X-Axis aggregation. I can't see why somebody would only want to compare bars containing only the Top 5 entries...
@driskell I totally agree that we should be able to produce "other" buckets, but the feature must be implemented in elasticsearch first (see https://github.com/elastic/elasticsearch/issues/5324 for progress). Once that is implemented this will be a far less confusing experience. For now, I recommend setting the size of the aggregation to something that makes the most sense for your data.
Looks like https://github.com/elastic/elasticsearch/pull/11042, so we can move forward with #1961.