Is the default `refresh_interval` a sensible default for Observability data?

jpountz commented 2 years ago

With the default refresh interval, either a shard is considered search-active and it gets refreshed every second, or it is not and it doesn't get refreshed. A shard is considered search-active if it has received a search request in the last 30 seconds.

I wonder if this default makes sense for Observability data. I believe that the intuition behind this default is that there might be someone starting a search session in Kibana that involves loading dashboards, Discover, etc. So when we notice that someone starts a search session, we start refreshing every second so that all searches see recent data and don't incur the cost of running the refresh (except for the first request).

But there are other usage patterns. Maybe someone is using Kibana to display dashboards on big screens in a war room. If they do this with Kibana refreshing dashboards every 10 seconds, then the shard will be considered search-active all the time and 9 refreshes out of 10 that Elasticsearch performs will be useless. This can be a big deal given how frequent refreshes hurt the indexing rate.

Another usage pattern is alerting. There are some data streams that rarely get queried, except by Alerting every 5 minutes by default. In that case, every 5 minutes Elasticsearch would refresh the shard every second for 30 seconds before no longer refreshing. This gives the worst of both worlds: the first request incurs the cost of running the refresh as part of executing the search, and 29 out of the 30 refreshes are unnecessary.

I wonder if we should consider alternatives for the default refresh, such as refreshing as part of the search request unless there was already a refresh in the past second. This would make search a bit slower in some cases, but this would also significantly reduce the impact of searches on indexing. Or maybe there are even better approaches?

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

ywelsch commented 2 years ago

I agree that a single request to an index-heavy search-idle index resulting in the index refreshing 30 times for the next 30 seconds would not be a good fit for the cases that you mentioned.

I was thinking about two complementary solutions:

Allow configuring refresh behavior at a more fine-granular level (i.e. allow coupling the search-idle behavior with longer refresh intervals, e.g. 5 seconds for observability data). Observability could then ship with a different set of defaults for these settings.
Avoid refreshes in the first place. For observability data, it's less important to have data that is extremely real-time, and data a couple of seconds old would be good enough. Search requests could specify that they don't require the absolute latest data by specifying an upper time range bound of "now-5s" and Elasticsearch would track min/max timestamp values of data that was indexed but not refreshed, allowing it forgo refreshes in case where the requested time range would not match any data in the indexing buffer.

jpountz commented 2 years ago

Allow configuring refresh behavior at a more fine-granular level

Right, my intuition was that we could keep refresh_interval's semantics about how stale a point-in-time view of the index is allowed to be and introduce a separate parameter about whether refreshes should be performed lazily, whenever a search request comes in and the data is not fresh enough, or eagerly on a schedule. I'm unsure if there is a true use-case for the current default behavior where you might either pay the cost of the refresh as part of the search request or not depending on how long the shard has been search-idle before your request?

Elasticsearch would track min/max timestamp values of data that was indexed but not refreshed

Ohhh I like this idea.

StephanErb commented 10 months ago

Has there been any further discussion or decision on how refresh_interval will be used for observability data, especially with TSDB indices?

Some recent work by @martijnvg on https://github.com/elastic/elasticsearch/issues/95776 goes towards optimizing refreshes during search on search-idle indices (https://github.com/elastic/elasticsearch/issues/95544, https://github.com/elastic/elasticsearch/issues/95541). However I think in practice most observability users will have to run with a custom refresh interval that is much higher (i.e. 10s of seconds, in the ballpark of metricset period) to make the ingestion cost effective. Having the ability to couple the search-idle behaviour with longer refresh intervals could thus be really helpful.

elastic / elasticsearch

Is the default `refresh_interval` a sensible default for Observability data? #78776