[Spike] Investigate separate Index lifecycle policies for each datastream

paulb-elastic commented 2 years ago

All monitors added in Monitor Management, use data streams to write back results to ES. There is a separate data stream for each monitor type (ICMP, HTTP or TCP), with browser monitors being further split down between the type of data we store (network, screenshot etc.).

In addition, the namespace that’s been defined when setting up the monitor (which will be default by default), is appended to the name of the data stream.

This can be visualised for example, with this set of monitors:

All these have been left on the default namespace of default except for the Test Browser in my_namespace monitor, which has been given a namespace of my_namespace:

In Index Management, we can see all the data streams that we use for all of these monitors (all begin with synthetics-...):

As you can see, there is one for each type of monitor, within each namespace, and browser monitors are further split into ...browser..., ...browser.network... and ...browser.screenshot....

However, all of these separate data streams, all use the same synthetics Index Lifecycle Policy:

As a result, every type of monitor, and each category of the browser results, are subject to the same retention period:

This means it’s not possible for users to be able to granularly configure the retention periods based on the type of monitor, or type of data.

For example, a typical use case may be to keep browser result data for 13 months (to allow year-on-year comparison), network data for 3 months, and screenshots for 1 month. This allows the user to balance how much storage they are consuming for these results, based on the value of that data being available.

Spike Expectations

This spike is to investigate if we can automatically configure a separate Index Lifecycle Policy for each of the data streams. It’s reasonable to imagine a 1:1 set up between each data stream to a separate Index Lifecycle Policy, even if they all begin with the same, default configuration. This then allows users to further configure these based on their needs and to control the amount of storage being consumed.

One consideration is that the data stream does not exist until a monitor is created, in a given namespace. So, in the above example, there is no data stream called synthetics-browser.network-my_namespace until a browser monitor is created in Monitor Management and saved in the my_namespace namespace. The first result will begin writing to the new synthetics-browser.network-my_namespace data stream, which will be subject to the existing synthetics Index Lifecycle Policy.

This spike needs to look into how we would be able to create these Index Lifecycle Policies on demand, and if there are any other implications of this.

You could imagine users making use of the namespace setting to further configure different data streams (and, by extension, the Index Lifecycle Policies) for monitors that should have different retention periods based on their business value, or a namespace (and associated less valuable monitor results) used to move data through warm/cold/frozen/delete phases quicker.

dominiqueclarke commented 2 years ago

Looks like Fleet may be keen to allow configuring ILM policies per data stream, based on this comment. https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/epm/packages/_install_package.ts#L124 I also seem to remember having this conversation a while back.

@jen-huang Has moving forward with ILM policies per data stream been discussed at all on the Fleet side?

jen-huang commented 2 years ago

Hi @dominiqueclarke, it is possible* today to set different policies for different datasets (i.e. browser.network, http) by making use of the component templates that Fleet already installs. It is not possible to set them at the namespace level (i.e. http-jenNamespace). We have a proposal of how to achieve namespace-level policies, which involves creating even more component templates, but that effort is currently deferred.

This issue has more details of what we have today that enables dataset-level customization vs what we would need to achieve namespace-level customization: https://github.com/elastic/kibana/issues/121118

*as soon as https://github.com/elastic/kibana/issues/121184 is fixed

dominiqueclarke commented 2 years ago

As Jen mentioned, it is possible today to create different ILM policies tied to specific datasets within the integration package spec. Unfortunately, we are blocked on namespace-level customizations as mentioned above.

Draft POC: https://github.com/elastic/integrations/pull/2744

This draft creates separate ILM policies for each data set, browser, browser.screenshot, browser.network, HTTP, ICMP, and TCP. We can move forward with defining a default policy for each dataset once the requirements for that policy are defined by @drewpost. Our users could then customize these default assets if desired. https://github.com/elastic/observability-docs/issues/1578

Sample data stream with segmented ILM policy Index-Management-Elastic (5) Index-Management-Elastic (4)

drewpost commented 2 years ago

@dominiqueclarke - We have the retention period defined by data type requirements already however we didn't go into the depth of hot/cold storage tiers as this was an option that the implementation gave us. Is that storage tier definition all you need (alongside the retention periods) to define OOTB settings?

dominiqueclarke commented 2 years ago

@drewpost Sorry for the delay. That is correct.

dominiqueclarke commented 2 years ago

Findings of the spike

cc: @drewpost @paulb-elastic @andrewvc

Segmenting by data set

Segmenting by data set is possible today in the Integration Package spec. Defaults for each data set can be specified, resulting in the creation of new ILM policies for each data set and component templates for each data set pointing to the specified ILM policies.

Segmenting by namespace

Segmenting by namespace is currently in the investigation and definition phase for Fleet, with work expected to begin in a future release. Once implemented, Fleet will generate an additional component template <type>-<dataset>-<namespace>@custom, to allow user-defined customization per namespace. This feature will build upon the existing feature set allowing for segmenting by data set. More information: https://github.com/elastic/kibana/issues/121118

Moving forward in 8.2.0

Defaults per data set can be specified in the Elastic Synthetics Integration package as early as 8.2.0. Establishing defaults per data set will not conflict with the enhancements coming in down the line, as the work will build upon the existing component template hierarchy used to generate index templates. @drewpost to provide the desired defaults for each data stream and data set (HTTP, ICMP, TCP, browser, browser.network, and browser.screenshot). @paulb-elastic to decide when to prioritize this work and whether we can move forward with including defaults as early as 8.2.0.

Moving forward with segmenting by namespace

Synthetics will require the ability to generate namespace-specific component templates and index templates on the fly. Uptime's UI Monitor Management and the Synthetics Service leverages Fleet-based data-stream architecture but saves monitors as saved objects instead of Fleet integration policies. Because monitors are not stored as Fleet integration policies, Fleet will not be notified by default when a user creates a new monitor with a non-default namespace.

To leverage allow Uptime to utilize the namespace segmentation feature, Fleet should expose a method on their plugin contract to generate component and index templates for a given package and namespace. The use case for Synthetics is defined here: https://github.com/elastic/kibana/issues/121118#issuecomment-1066845288

Once exposed, Uptime will need to ensure that proper component and index templates are installed when a new monitor is saved. If the namespace of the monitor is anything but default, Synthetics will invoke the Fleet service to generate the corresponding component and index templates.

andrewvc commented 2 years ago

@dominiqueclarke thanks for digging up all those answers. It seems to me that we can create a new issue to encapsulate our ultimate plans to create a lifecycle policy for namespaces, and between that issue and https://github.com/elastic/uptime/issues/462 we can close this one out.

Does that sound right?

dominiqueclarke commented 2 years ago

@andrewvc Yep, @paulb-elastic actually already created an issue off the back of this spike https://github.com/elastic/uptime/issues/462

paulb-elastic commented 2 years ago

Thank you @dominiqueclarke for finding out how to proceed. Closing ths as discussed ^^.

elastic / uptime