We have lots of solutions that create automatic dashboards for their applications (APM, Security, ...).
These dashboards and visualizations are accessed very often, every time a user enters the application.
So the same requests are repeated over and over with different time ranges. For these use cases, we rely on the shard request cache to catch these repetitions. However we have no way to differentiate a popular dashboards from another search request so they compete for the same entries in the cache. The cache is also entirely in memory and local to a node so any restart/relocation loses the entries.
It is tempting for read-only indices to think about a persistent cache per shard/index that could be used to record these popular queries on rollover (or when they become truly read-only).
These cache entries would ensure that the main dashboards of our Apps are always fast on old data, only the write index and the index at the boundary of the requested range would need to compute the request fully. We kind of rely on this caching behavior today but we don't force this behavior.
Considering how important it is for large deployment with billions of events, we should make it clear that caching is the way to scale a complex dashboard.
A more aggressive rollover can also help in this model to minimize the dynamic part of datastreams.
Another approach could be to populate these caches in a batch job regularly. We could run these queries in the background on the datastreams when they are not present in the cache. In this model we don't need a persistent cache but it would require more requests overall.
We have lots of solutions that create automatic dashboards for their applications (APM, Security, ...). These dashboards and visualizations are accessed very often, every time a user enters the application. So the same requests are repeated over and over with different time ranges. For these use cases, we rely on the shard request cache to catch these repetitions. However we have no way to differentiate a popular dashboards from another search request so they compete for the same entries in the cache. The cache is also entirely in memory and local to a node so any restart/relocation loses the entries.
It is tempting for read-only indices to think about a persistent cache per shard/index that could be used to record these popular queries on rollover (or when they become truly read-only). These cache entries would ensure that the main dashboards of our Apps are always fast on old data, only the write index and the index at the boundary of the requested range would need to compute the request fully. We kind of rely on this caching behavior today but we don't force this behavior. Considering how important it is for large deployment with billions of events, we should make it clear that caching is the way to scale a complex dashboard. A more aggressive rollover can also help in this model to minimize the dynamic part of datastreams.
Another approach could be to populate these caches in a batch job regularly. We could run these queries in the background on the datastreams when they are not present in the cache. In this model we don't need a persistent cache but it would require more requests overall.