Warn when interval in aggregation builder might cause issues with search time range

Graylog2 / graylog2-server

Free and open log management

Other

7.4k stars 1.07k forks source link

What?

The aggregation builder should display a warning when the selected interval is likely to cause issues with the current search time range.

For example, a user should be warned if:

The aggregation interval is 1 second and the search time range is 90 days, as it's likely unintentional and may result in UI and/or backend performance issues.

The aggregation interval is 1 day and the search time range is 10 seconds, as the visualization might not display correctly (e.g. bar charts don't show a bar-- maybe a bug).

Why?

Sometimes users unintentionally have a large time range and small aggregation interval, which puts extra load on ES and (if the search completes) can essentially lock up the UI when it tries to render the visualization.

Having a clear warning of the potential issue would help prevent unintentionally bad activity.

Hey @coffee-squirrel,

thanks for the suggestion! Those are valid points and the main reason why we introduced the auto interval option, so users would not have to deal with specifying exact intervals and changing them when the time range changes. We are assuming that if you are not using the auto interval that you are aware of the implications.

Nonetheless I think we can improve some things, but the devil's in the details:

The bucketing interval should not be bigger than the current time range. But what should we do when there is a widget with an interval of 1 day and the time range is decreased to e.g. 8 hours? Throw an error unless the bucketing interval is changed? Silently changing the bucketing interval?
If a small interval and a large time range will result in performance issues is hard to predict and depends on both the user's workstation and the ES/OpenSearch cluster's performance. For 1 second and 90 days it is pretty clearly not ideal, but where should we draw the line here?

These are non-rhetorical questions, so any input is very welcome.

Those are valid points and the main reason why we introduced the auto interval option, so users would not have to deal with specifying exact intervals and changing them when the time range changes. We are assuming that if you are not using the auto interval that you are aware of the implications.

That's fair, and definitely a reasonable default. I nearly always uncheck Auto, since:

the interval size is often relevant when troubleshooting (communicating findings, etc.)
I usually want to preserve a certain level of detail (e.g. hourly when switching from 8h to 2d)

The bucketing interval should not be bigger than the current time range. But what should we do when there is a widget with an interval of 1 day and the time range is decreased to e.g. 8 hours? Throw an error unless the bucketing interval is changed? Silently changing the bucketing interval?

Perhaps display based upon the available data, but show a warning icon and message on impacted widgets to let users know about the mismatch. It might be useful to temporarily have that situation in order to get some quick counts, etc., despite the potential UI strangeness.

If a small interval and a large time range will result in performance issues is hard to predict and depends on both the user's workstation and the ES/OpenSearch cluster's performance. For 1 second and 90 days it is pretty clearly not ideal, but where should we draw the line here?

Yeah, it can definitely depend upon various factors (performance, message count, etc.). The warning wouldn't necessarily need to prevent anything or be based upon characteristics of the environment or query, so perhaps just picking some reasonably large time_span / interval threshold (e.g. 1d / 1s = 86400 potential buckets) would be good enough.

Graylog2 / graylog2-server

Warn when interval in aggregation builder might cause issues with search time range #13016

What?

Why?

Your Environment