Open lahsivjar opened 3 weeks ago
The fix for v8.15.1
has been merged via https://github.com/elastic/elasticsearch/pull/112097, however, the same patch cannot be applied to v8.16.0
. The details of this are tracked in https://github.com/elastic/elasticsearch/issues/112137
I don't love https://github.com/elastic/elasticsearch/pull/112097 as a solution, since it kinda implies we'll keep the old ILM config around in the index templates for perpetuity.
Perhaps we can remove that while also addressing serverless, by dynamically injecting the ILM config into the index template if there are existing indices with ILM config?
We need to handle the following scenarios:
With the current changes, the custom ILM policies would break even for versions >= 8.15.1 as index templates cannot be overridden. https://github.com/elastic/elasticsearch/pull/112432 provides a fix for this by moving the fallback to component template which could be overridden as required. NOTE that this would also require us to update our documents for configuring custom ILM policy to include prefer_ilm: true
for datastreams created on or after 8.15.x
Create a new 8.15.1+ cluster, with defaults: DLM should be used
Create a new 8.15.1+ cluster, with customised ILM (https://www.elastic.co/guide/en/observability/current/apm-ilm-how-to.html#apm-data-streams-custom-three): customised ILM policy should be used
Upgrade 8.14.x to 8.15.1+ with defaults: ILM should continue to be used for old indices, DLM should be used for new indices
@axw this case, as expected, did not work. The reason is that Datastream Lifecycle needs to be explicitly configured for ALREADY created datastreams. So, for any old cluster which has used APM and upgraded to 8.15.1+ would continue to use ILM (even for new indices) unless they explicitly configure DSL using the PUT API.
Upgrade 8.14.x to 8.15.1+ with customised ILM (same guide as above): ILM should continue to be used for old indices, and also for new indices
Upgrade 8.14.x to 8.15.0 with custom ILM policy: custom ILM policy should work as expected (no bugs for this case)
Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1
This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?
Reading through the test cases, can you clarify if this is true: customers on 8.14.3
with custom ILM policies, who upgrade to 8.15.1
will not have to do any manual interactions for their custom ILM policies to continue to be applied?
Please also provide test scenarios where users already have upgraded to 8.15.0
and then upgrade to 8.15.1
.
Reading through the test cases, can you clarify if this is true: customers on 8.14.3 with custom ILM policies, who upgrade to 8.15.1 will not have to do any manual interactions for their custom ILM policies to continue to be applied?
True, no changes need to be done for this case.
Please also provide test scenarios where users already have upgraded to 8.15.0 and then upgrade to 8.15.1.
For customers having custom ILM, all would be good. Even for version 8.15.0, they would be in the clear i.e. all their indices would be managed by the configured custom ILM policy (already tested above).
However, if a customer with the default ILM policy has moved to 8.15.0 and then upgraded to 8.15.1, then, the indices created in 8.15.0 would be unmanaged even after the upgrade to 8.15.1. Indices created after the upgrade to 8.15.1 would be good though. I was initially thinking of suggesting configuring DSL as when DSL on a datastream is configured then all unmanaged indices are moved to be managed by DSL, however, this would have the side-effect of moving from ILM to DSL. I don't think this should be a big deal as the solution would only be required for installations using the default ILM policies, but, would be good if others could validate this (CC:@silvia, @axw)
([DONE] PS: I will update the test case comment with the details on 8.15.0 as a step in the upgrade path)
Test with 8.14.3 -> 8.15.0 -> 8.15.1
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 1,
"failure_indices": [],
"_meta": {
"package": {
"name": "apm"
},
"managed": true,
"managed_by": "fleet"
},
"status": "GREEN",
"template": "traces-apm",
"ilm_policy": "traces-apm.traces-default_policy",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": true,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": false
}
]
}
POST /traces-apm-default/_rollover/
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000002",
"index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 2,
"failure_indices": [],
"_meta": {
"package": {
"name": "apm"
},
"managed_by": "fleet",
"managed": true
},
"status": "GREEN",
"template": "traces-apm",
"ilm_policy": "traces-apm.traces-default_policy",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": true,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": false
}
]
}
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000002",
"index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 2,
"_meta": {
"package": {
"name": "apm"
},
"managed_by": "fleet",
"managed": true
},
"status": "GREEN",
"template": "traces-apm@template",
"next_generation_managed_by": "Unmanaged",
"prefer_ilm": true,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": {
"enabled": false,
"rollover_on_write": false,
"indices": []
}
}
]
}
POST /traces-apm-default/_rollover/
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000002",
"index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000003",
"index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
"prefer_ilm": true,
"managed_by": "Unmanaged"
}
],
"generation": 3,
"_meta": {
"package": {
"name": "apm"
},
"managed_by": "fleet",
"managed": true
},
"status": "GREEN",
"template": "traces-apm@template",
"next_generation_managed_by": "Unmanaged",
"prefer_ilm": true,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": {
"enabled": false,
"rollover_on_write": false,
"indices": []
}
}
]
}
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000002",
"index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000003",
"index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
"prefer_ilm": true,
"managed_by": "Unmanaged"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000004",
"index_uuid": "wVXW11CzTAavehkx1FksLw",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 4,
"_meta": {
"package": {
"name": "apm"
},
"managed_by": "fleet",
"managed": true
},
"status": "GREEN",
"template": "traces-apm@template",
"ilm_policy": "traces-apm.traces-default_policy",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": false,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": {
"enabled": false,
"rollover_on_write": false,
"indices": []
}
}
]
}
POST /traces-apm-default/_rollover/
GET /_data_stream/traces-apm-default
{
"data_streams": [
{
"name": "traces-apm-default",
"timestamp_field": {
"name": "@timestamp"
},
"indices": [
{
"index_name": ".ds-traces-apm-default-2024.09.02-000001",
"index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000002",
"index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
"prefer_ilm": true,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000003",
"index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
"prefer_ilm": true,
"managed_by": "Unmanaged"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000004",
"index_uuid": "wVXW11CzTAavehkx1FksLw",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
},
{
"index_name": ".ds-traces-apm-default-2024.09.02-000005",
"index_uuid": "UJ8VufpYTgyKa-tXaB4BPA",
"prefer_ilm": false,
"ilm_policy": "traces-apm.traces-default_policy",
"managed_by": "Index Lifecycle Management"
}
],
"generation": 5,
"_meta": {
"package": {
"name": "apm"
},
"managed_by": "fleet",
"managed": true
},
"status": "GREEN",
"template": "traces-apm@template",
"ilm_policy": "traces-apm.traces-default_policy",
"next_generation_managed_by": "Index Lifecycle Management",
"prefer_ilm": false,
"hidden": false,
"system": false,
"allow_custom_routing": false,
"replicated": false,
"rollover_on_write": false,
"failure_store": {
"enabled": false,
"rollover_on_write": false,
"indices": []
}
}
]
}
This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?
It's the simplest code change, but I'm not convinced that anyone will go ahead and actively switch their data streams to DSL; I expect we'll end up with users indefinitely sticking with the legacy ILM policies and bifurcating the Serverless & Hosted experience. IMO we should take one of two routes:
prefer_ilm: true
in their @custom
component templates.I prefer (2) since users will need to start setting prefer_ilm: true
in their @custom
component templates if they want to use ILM. The only other alternative I can see is to revert the use of DSL, but I feel like the cat's already out of the bag with 8.15.0 being released.
Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1
This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?
@lahsivjar it only creates unmanaged indices in 8.15.0
, right? With the fix and changes in 8.15.1
, my understanding is that as soon as users upgrade to this version, new, managed backing indices will be created. The 8.15.0
ones would still be unmanaged, but any new ones would not. Can you confirm this?
@lahsivjar it only creates unmanaged indices in 8.15.0, right? With the fix and changes in 8.15.1, my understanding is that as soon as users upgrade to this version, new, managed backing indices will be created. The 8.15.0 ones would still be unmanaged, but any new ones would not. Can you confirm this?
Yes, this is correct. Only indices created with version 8.15.0 will remain unmanaged.
Create a new 8.15.1+ cluster, with defaults: DLM should be used
Create a new 8.15.1+ cluster, with customised ILM (https://www.elastic.co/guide/en/observability/current/apm-ilm-how-to.html#apm-data-streams-custom-three): customised ILM policy should be used
Upgrade 8.14.x to 8.15.1+ with defaults: ILM should continue to be used for old indices, DLM should be used for new indices
Upgrade 8.14.x to 8.15.1+ with customised ILM (same guide as above): ILM should continue to be used for old indices, and also for new indices
Upgrade 8.14.x to 8.15.0 with custom ILM policy: custom ILM policy should work as expected (no bugs for this case)
Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1
APM Server version (
apm-server version
):8.15.0
Description of the problem including expected versus actual behavior: In
8.15.0
we migrated from ILM to DLM. New indexes created for clusters which migrate to8.15.0
don't have any lifecycle attached as existing datastream needs to be updated explicitly: https://www.elastic.co/guide/en/elasticsearch/reference/current/tutorial-manage-existing-data-stream.html.Steps to reproduce:
Please include a minimal but complete recreation of the problem, including server configuration, agent(s) used, etc. The easier you make it for us to reproduce it, the more likely that somebody will take the time to look at it.
8.15.0
8.15.0
Unmanaged
(for example: by usingGET /_data_stream/traces-apm-default
for traces datastream)Unmanaged
(for example: by usingGET /_data_stream/traces-apm-default
for traces datastream)Temporary mitigation is to explicitly set the lifecycle using the PUT API. For example, the below operations would set data retention based on APM defaults for all APM datastreams:
Provide logs (if relevant): N/A