The way the "limit" option works within aggregations is not intuitive

tellistone commented 2 years ago

The way the "limit" option works within aggregations is not intuitive and produces strictly misleading outputs/visualisations.

Expected Behavior

Rows excluded from an aggregation using the limit option should be summed up into a single row named "other". This way, the relative percentages of each row always remain constant and accurate.

Current Behavior

At present, if I have a pie chart that would have 7 diferent "rows" in the legend, and I apply a limit of 5, I will get a pie chart that excludes the remaining 2 "rows" entirely from the results.

Context

The limit option in Aggregations should not exclude results from the series.

At present, if I have a pie chart that would have 7 diferent "rows" in the legend, and I apply a limit of 5, I will get a pie chart that excludes the remaining 2 "rows" entirely from the results.

This distorts the ability of the output to correctly show relativity (eg. what % of events captured are from each row) and can make for very misleading results.

For Example, does kernel represent 52.3% of results? or 41% of results? Or in fact, neither?

Screenshot 2021-12-08 at 10 28 25

Screenshot 2021-12-08 at 10 28 08

The way it should work in my view (this is the way it works on Splunk for example) is that rows excluded from the limit should be summed up into a single row named "other". This way, the relative percentages always remain constant and accurate.

Why is the current way the limit function works a cardinal sin? I think because the aggregation controls should not be able to affect which messages are encompassed by the aggregation visualisation - only the search filter should be able to define which results are encompassed. The aggregation controls should only be allowed to show how those results are displated. It's important to seperate powers in the interface this way so the user can understand where their results are coming from - by effectively having two seperate ways to filter out results, you make neither one definitive.

Your Environment

Graylog Version: 4.2.0

tellistone commented 2 years ago

Cousin of https://github.com/Graylog2/graylog2-server/issues/11516

kroepke commented 2 years ago

We should have an option to display the "other" group as well, as we had it in the old quick values widget. There are multiple paths and options, the context menu says "Show top values", which one could argue doesn't necessarily need to have the "others" group, but for many applications users will want to know the distribution and thus knowing how many "others" there are is important.

tellistone commented 2 years ago

In that scenario, I'd suggest the option to display the "other" group should be enabled by default (default settings should not filter messages out of results).

tellistone commented 1 year ago

Bumping this, its so frustrating looking for a middle ground between "graph is too busy to read" and "half the results are missing"

Graylog2 / graylog2-server