Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.4k stars 1.07k forks source link

Warn when interval in aggregation builder might cause issues with search time range #13016

Open coffee-squirrel opened 2 years ago

coffee-squirrel commented 2 years ago

What?

The aggregation builder should display a warning when the selected interval is likely to cause issues with the current search time range.

For example, a user should be warned if:

Why?

Sometimes users unintentionally have a large time range and small aggregation interval, which puts extra load on ES and (if the search completes) can essentially lock up the UI when it tries to render the visualization.

Having a clear warning of the potential issue would help prevent unintentionally bad activity.

Your Environment

dennisoelkers commented 2 years ago

Hey @coffee-squirrel,

thanks for the suggestion! Those are valid points and the main reason why we introduced the auto interval option, so users would not have to deal with specifying exact intervals and changing them when the time range changes. We are assuming that if you are not using the auto interval that you are aware of the implications.

Nonetheless I think we can improve some things, but the devil's in the details:

These are non-rhetorical questions, so any input is very welcome.

coffee-squirrel commented 2 years ago

Those are valid points and the main reason why we introduced the auto interval option, so users would not have to deal with specifying exact intervals and changing them when the time range changes. We are assuming that if you are not using the auto interval that you are aware of the implications.

That's fair, and definitely a reasonable default. I nearly always uncheck Auto, since:

The bucketing interval should not be bigger than the current time range. But what should we do when there is a widget with an interval of 1 day and the time range is decreased to e.g. 8 hours? Throw an error unless the bucketing interval is changed? Silently changing the bucketing interval?

Perhaps display based upon the available data, but show a warning icon and message on impacted widgets to let users know about the mismatch. It might be useful to temporarily have that situation in order to get some quick counts, etc., despite the potential UI strangeness.

If a small interval and a large time range will result in performance issues is hard to predict and depends on both the user's workstation and the ES/OpenSearch cluster's performance. For 1 second and 90 days it is pretty clearly not ideal, but where should we draw the line here?

Yeah, it can definitely depend upon various factors (performance, message count, etc.). The warning wouldn't necessarily need to prevent anything or be based upon characteristics of the environment or query, so perhaps just picking some reasonably large time_span / interval threshold (e.g. 1d / 1s = 86400 potential buckets) would be good enough.