Closed paulb-elastic closed 2 years ago
The platform automatically configures separate ILM policies for each type of data collected from Synthetic monitoring (lightweight and browser)
From my understanding this should be implemented by adding new ILM policies to the synthetics
package, and there should not be any custom Kibana code needed to support this.
One thing to note is the current issues we have around privileges for ILM policies, which will likely require some privilege changes to Elasticsearch's kibana_system role definition. Please see https://github.com/elastic/package-spec/issues/293 for an explanation of the overall problem and https://github.com/elastic/elasticsearch/pull/85085 as an example of how to do this.
Thank you @joshdover for the additional insight
@drewpost @paulb-elastic
We need default storage tier and retention period definitions to begin this ticket, and to evaluate what permissions, if any, the kibana_system user will need to support this feature.
Sorry @dominiqueclarke - This has been documented but it appears it didn't get into this ticket. Here's the defaults:
365 days HTTP lightweight checks (synthetics-http) ICMP lightweight checks (synthetics-icmp) TCP lightweight checks (synthetics-tcp)
Browser checks - core data (synthetics-browser): 365 days
Browser checks - network data (synthetics-network): 14 days
Browser checks - screenshots: 14 days
In terms of the storage tiers, do we have any understanding of the real world speed tradeoffs here?
@paulb-elastic
While Drew's requirements are clear, there are many ways we can implement a 14 day retention period for network data and screenshots.
When configuring phases, a max age or max size of the write index is specified. If either the max age or the max size is reached for the main write index, a rollover occurs, a new write index is created, and the old write index becomes a read index. It's from this point of rollover that the delete phase countdown begins.
For example, let's take the default hot phase rollover requirements of, 30 days old or any primary shard reaches 50 gigabytes, and assume a delete phase timeline of 14 days as specified by Drew. This means that data will be deleted after a max of 44 days, but could be shorter if the index reaches 50 gigabytes sooner.
The question becomes: what combination of hot phase and delete phase configuration should we employ to get close to our target of deleting data after 14 days?
We could, for example, do any number of combinations | HOT | DELETE | |
---|---|---|---|
Max age | 1 day | 14 days | |
Max size | 50 gigabytes | - |
HOT | DELETE | |
---|---|---|
Max age | 1 day | 13 days |
Max size | 50 gigabytes | - |
HOT | DELETE | |
---|---|---|
Max age | 3 day | 11 days |
Max size | 50 gigabytes | - |
etc etc.
@paulb-elastic thoughts?
From a time perspective, it seems having a hot of 14 days and then a delete of 0 days would meet @drewpost’s default requirement (I haven’t tried it in action, but I can certainly build a policy with this configuration). That’s for time based.
You raise an interesting point about the size element too. Does this have to be set? In the UI, I seem to be able to not set this (same as the max number of documents for example). This would probably be the best default. I don’t know if we have a good handle on how big is enough to never reach the size limit, so as always to only hit the age limit. Ideally we’d not set that (these are just the defaults too, users can still customise these for their preferred configuration).
Post FF Testing
Hot Phase - Maximum Age (days) | Hot Phase - Maximum Index Size (GB) | Delete Phase - Max Age (days) | |
---|---|---|---|
browser-default_policy | 30 | 50 | 365 |
browser_network-default_policy | 1 | default | 14 |
browser_screenshot-default_policy | 1 | default | 14 |
http-default_policy | 30 | 50 | 365 |
tcp-default_policy | 30 | 50 | 365 |
browser-default_policy | 30 | 50 | 365 |
Following on from the spike uptime#453, this is the implementation issue to get this feature added (understanding that the platform does not currently allow separation based on namespace, but may do in the future).
As a user of Synthetic Monitoring (lightweight and browser) I want to be able to configure different life cycles for Elasticsearch result data from my monitors So that I can balance cost of storage against the value of that data, over longer periods
ACs:
synthetics-http
)synthetics-icmp
)synthetics-tcp
)synthetics-browser
)synthetics-network
)synthetics-screenshot
)Each data set should have a delete phase with the following specifications
365 days HTTP lightweight checks (synthetics-http) ICMP lightweight checks (synthetics-icmp) TCP lightweight checks (synthetics-tcp)
Browser checks - core data (synthetics-browser): 365 days
Browser checks - network data (synthetics-network): 14 days
Browser checks - screenshots: 14 days
Reference to the docs task to clarify this in the documentation