Closed casepie closed 8 years ago
Good point, the result apparently ignores the sum of the long tail terms in the result set. I'll see if we can quickly fix that.
Thanks for your comprehensive report!
For reference, we need to take into account the sum_other_doc_count return value of the aggregation.
The "other" pie chart entry should then be the overflow (the values shown in the "other") table plus the sum_other_doc_count
.
Ideally we'd also show the sum_other_doc_count
in the table somehow, e.g.:
Others (${sum_other_doc_count} values not shown)
Turns out this was purely a display bug with the pie chart, the data table was ok. We also already use the correct numbers in the data table, but rendered them incorrectly in the chart.
When analyzing the flow logs from my firewall and building a graph of IDS alerts centered around "source_address" (source IP), I'll get a pie graph and a data table (obviously). The problem is this. Often times, when creating the query, there may be 100 or more unique values for "source_address".
Expected Behavior
One would expect the percentages for a given value (in my case, source_address) shown visually, to be the same from the pie graph, to the data table below.
Current Behavior
If you have more than 50 unique data values for the query in the field used to create your pie graph, then you'll have a discrepancy between the pie graph and the data table on the dashboard widget. The data table appears to still build it's percentage based on the entire query results. (all 100+ IP addresses)
However, Graylog only shows 50 results for source_address in the data table. The problem comes in when the pie graph appears to calculate the percentage for that value (in my case, source_address) based only on the 50 source_addresses in the displayed data table (and not on the full query results).
Possible Solution
Would suggest that the pie graph should also be calculated / drawn based on the percentage from the full query results so that the numbers there visually match what is displayed in the data table (i.e. If the data table says that IP number 10.10.16.1 accounted for 18% of the results, then that slice of the pie should visually represent about 18% of the pie graph.
Steps to Reproduce (for bugs)
Context
Our use case is based on using Juniper SRX firewall logs. We capture Intrusion Detection (IDS) logs and then build a dashboard item for "IDS alerts by Source IP". This is a "quick values" chart based on "source_address". It usually results in many hundreds of unique values for "source_address" with only a few that are statistically significant (above 3-5%). However the pie graph looks very skewed when compared to the data table.
Your Environment