elastic / apm-server

https://www.elastic.co/guide/en/apm/guide/current/index.html
Other
1.21k stars 518 forks source link

New indexes created for datastreams after update to `8.15.0` are without lifecycle policies #13898

Open lahsivjar opened 3 weeks ago

lahsivjar commented 3 weeks ago

APM Server version (apm-server version): 8.15.0

Description of the problem including expected versus actual behavior: In 8.15.0 we migrated from ILM to DLM. New indexes created for clusters which migrate to 8.15.0 don't have any lifecycle attached as existing datastream needs to be updated explicitly: https://www.elastic.co/guide/en/elasticsearch/reference/current/tutorial-manage-existing-data-stream.html.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including server configuration, agent(s) used, etc. The easier you make it for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. Create a cluster with version < 8.15.0
  2. Send data to the cluster (for example: traces) (and continue sending it throughout the steps)
  3. Perform a rollover on the datastream getting data
  4. Upgrade the cluster to 8.15.0
  5. Observe that the latest indices are Unmanaged (for example: by using GET /_data_stream/traces-apm-default for traces datastream)
  6. Perform a rollover on the datastream getting data
  7. Observe that the newly created index is also Unmanaged (for example: by using GET /_data_stream/traces-apm-default for traces datastream)

Temporary mitigation is to explicitly set the lifecycle using the PUT API. For example, the below operations would set data retention based on APM defaults for all APM datastreams:

PUT _data_stream/traces-apm-*/_lifecycle
{
  "data_retention": "10d" 
}

PUT _data_stream/traces-apm.rum*/_lifecycle
{
  "data_retention": "90d" 
}

PUT _data_stream/traces-apm.sampled*/_lifecycle
{
  "data_retention": "1h" 
}

PUT _data_stream/metrics-apm.*.1m-*/_lifecycle
{
  "data_retention": "90d" 
}

PUT _data_stream/metrics-apm.*.10m-*/_lifecycle
{
  "data_retention": "180d" 
}

PUT _data_stream/metrics-apm.*.60m-*/_lifecycle
{
  "data_retention": "390d" 
}

PUT _data_stream/metrics-apm.internal-*/_lifecycle
{
  "data_retention": "90d" 
}

PUT _data_stream/metrics-apm.app.*/_lifecycle
{
  "data_retention": "90d" 
}

PUT _data_stream/logs-apm.*/_lifecycle
{
  "data_retention": "10d" 
}

Provide logs (if relevant): N/A

lahsivjar commented 3 weeks ago

The fix for v8.15.1 has been merged via https://github.com/elastic/elasticsearch/pull/112097, however, the same patch cannot be applied to v8.16.0. The details of this are tracked in https://github.com/elastic/elasticsearch/issues/112137

axw commented 2 weeks ago

I don't love https://github.com/elastic/elasticsearch/pull/112097 as a solution, since it kinda implies we'll keep the old ILM config around in the index templates for perpetuity.

Perhaps we can remove that while also addressing serverless, by dynamically injecting the ILM config into the index template if there are existing indices with ILM config?

axw commented 1 week ago

We need to handle the following scenarios:

lahsivjar commented 1 week ago

With the current changes, the custom ILM policies would break even for versions >= 8.15.1 as index templates cannot be overridden. https://github.com/elastic/elasticsearch/pull/112432 provides a fix for this by moving the fallback to component template which could be overridden as required. NOTE that this would also require us to update our documents for configuring custom ILM policy to include prefer_ilm: true for datastreams created on or after 8.15.x

Testing https://github.com/elastic/elasticsearch/pull/112432 locally

Create a new 8.15.1+ cluster, with defaults: DLM should be used

✅ Tested (new cluster created locally) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "4rLT17ZQQIC6mFDlPS1SNA", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" } ], "generation": 1, "_meta": { "description": "Index template for traces-apm-*", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Data stream lifecycle", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": true, "indices": [] } } ] } ```

Create a new 8.15.1+ cluster, with customised ILM (https://www.elastic.co/guide/en/observability/current/apm-ilm-how-to.html#apm-data-streams-custom-three): customised ILM policy should be used

✅ Tested by created traces-apm@custom component template with a custom ILM policy but prefer_ilm as false Since the prefer_ilm is `false`, DSL took priority. ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "4rLT17ZQQIC6mFDlPS1SNA", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "FeVicze2S7ex5o7R7yeLaQ", "prefer_ilm": false, "ilm_policy": "custom-ilm-override-dsl", "managed_by": "Data stream lifecycle" } ], "generation": 2, "_meta": { "description": "Index template for traces-apm-*", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "custom-ilm-override-dsl", "next_generation_managed_by": "Data stream lifecycle", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": true, "indices": [] } } ] } ```
✅ Tested by updating traces-apm@custom component template to include prefer_ilm as true ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "4rLT17ZQQIC6mFDlPS1SNA", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "FeVicze2S7ex5o7R7yeLaQ", "prefer_ilm": false, "ilm_policy": "custom-ilm-override-dsl", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000003", "index_uuid": "7ufMCy5RTBaxTBJ78dxC-A", "prefer_ilm": true, "ilm_policy": "custom-ilm-override-dsl", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "description": "Index template for traces-apm-*", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "custom-ilm-override-dsl", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": true, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": true, "indices": [] } } ] } ```

Upgrade 8.14.x to 8.15.1+ with defaults: ILM should continue to be used for old indices, DLM should be used for new indices

@axw this case, as expected, did not work. The reason is that Datastream Lifecycle needs to be explicitly configured for ALREADY created datastreams. So, for any old cluster which has used APM and upgraded to 8.15.1+ would continue to use ILM (even for new indices) unless they explicitly configure DSL using the PUT API.

❌ Tested by updating from 8.14.x to 8.15.1 ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "farCK-0lRI-H64IxGfRUuQ", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "C_qI1J2FQ5O-81VuHMl82A", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000003", "index_uuid": "HVQRbVrtSEm6UmeyPWj8bA", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": false, "indices": [] } } ] } ```

Upgrade 8.14.x to 8.15.1+ with customised ILM (same guide as above): ILM should continue to be used for old indices, and also for new indices

✅ Tested (cluster created locally with 8.14.3 and upgraded to 8.15.1) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "I5k3QG4QRGGZLdUnNAWpJQ", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "IoauvZ7pQqC-xsRkDTLBkA", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm-test", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000003", "index_uuid": "DDJu8FbnTUCapW6Xr13-Kw", "prefer_ilm": false, "ilm_policy": "custom-ilmdlm-test", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "custom-ilmdlm-test", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": false, "indices": [] } } ] } ```

Upgrade 8.14.x to 8.15.0 with custom ILM policy: custom ILM policy should work as expected (no bugs for this case)

✅ Tested by upgrading from 8.14.3 with custom ILM to 8.15.0 ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "dFLoaPf3QZC4t7ttplDSqg", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm-test", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "fJ__qQv7Tumu9n_j8hEljA", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm-test", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000003", "index_uuid": "kNSWO6pMQ06eZo7yHqKQCw", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm-test", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "custom-ilmdlm-test", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": true, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1

This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?

⚠️ Tested: leads to unmanaged indices for indices that were created in 8.15.0 ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.02-000001", "index_uuid": "4MAl7phPSkexDJFda-38RQ", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000002", "index_uuid": "hv1XnUBeQmim7VkOgvHAiw", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000003", "index_uuid": "jb2UpzXgQ_CmcVPTW2ICwA", "prefer_ilm": true, "managed_by": "Unmanaged" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000004", "index_uuid": "4e9MCYGWSqKR8BZ2JYKZeA", "prefer_ilm": true, "managed_by": "Unmanaged" }, { "index_name": ".ds-traces-apm-default-2024.09.02-000005", "index_uuid": "OlSp2sXGRA6ozl86UIe3Lw", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" } ], "generation": 5, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false, "failure_store": { "enabled": false, "rollover_on_write": false, "indices": [] } } ] } ```
simitt commented 1 week ago

Reading through the test cases, can you clarify if this is true: customers on 8.14.3 with custom ILM policies, who upgrade to 8.15.1 will not have to do any manual interactions for their custom ILM policies to continue to be applied?

Please also provide test scenarios where users already have upgraded to 8.15.0 and then upgrade to 8.15.1.

lahsivjar commented 1 week ago

Reading through the test cases, can you clarify if this is true: customers on 8.14.3 with custom ILM policies, who upgrade to 8.15.1 will not have to do any manual interactions for their custom ILM policies to continue to be applied?

True, no changes need to be done for this case.

Please also provide test scenarios where users already have upgraded to 8.15.0 and then upgrade to 8.15.1.

For customers having custom ILM, all would be good. Even for version 8.15.0, they would be in the clear i.e. all their indices would be managed by the configured custom ILM policy (already tested above).

However, if a customer with the default ILM policy has moved to 8.15.0 and then upgraded to 8.15.1, then, the indices created in 8.15.0 would be unmanaged even after the upgrade to 8.15.1. Indices created after the upgrade to 8.15.1 would be good though. I was initially thinking of suggesting configuring DSL as when DSL on a datastream is configured then all unmanaged indices are moved to be managed by DSL, however, this would have the side-effect of moving from ILM to DSL. I don't think this should be a big deal as the solution would only be required for installations using the default ILM policies, but, would be good if others could validate this (CC:@silvia, @axw)

([DONE] PS: I will update the test case comment with the details on 8.15.0 as a step in the upgrade path)

kruskall commented 1 week ago

Test with 8.14.3 -> 8.15.0 -> 8.15.1

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        }
      ],
      "generation": 1,
      "failure_indices": [],
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed": true,
        "managed_by": "fleet"
      },
      "status": "GREEN",
      "template": "traces-apm",
      "ilm_policy": "traces-apm.traces-default_policy",
      "next_generation_managed_by": "Index Lifecycle Management",
      "prefer_ilm": true,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": false
    }
  ]
}

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000002",
          "index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        }
      ],
      "generation": 2,
      "failure_indices": [],
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "status": "GREEN",
      "template": "traces-apm",
      "ilm_policy": "traces-apm.traces-default_policy",
      "next_generation_managed_by": "Index Lifecycle Management",
      "prefer_ilm": true,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": false
    }
  ]
}

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000002",
          "index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        }
      ],
      "generation": 2,
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "status": "GREEN",
      "template": "traces-apm@template",
      "next_generation_managed_by": "Unmanaged",
      "prefer_ilm": true,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": {
        "enabled": false,
        "rollover_on_write": false,
        "indices": []
      }
    }
  ]
}

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000002",
          "index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000003",
          "index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
          "prefer_ilm": true,
          "managed_by": "Unmanaged"
        }
      ],
      "generation": 3,
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "status": "GREEN",
      "template": "traces-apm@template",
      "next_generation_managed_by": "Unmanaged",
      "prefer_ilm": true,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": {
        "enabled": false,
        "rollover_on_write": false,
        "indices": []
      }
    }
  ]
}

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000002",
          "index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000003",
          "index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
          "prefer_ilm": true,
          "managed_by": "Unmanaged"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000004",
          "index_uuid": "wVXW11CzTAavehkx1FksLw",
          "prefer_ilm": false,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        }
      ],
      "generation": 4,
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "status": "GREEN",
      "template": "traces-apm@template",
      "ilm_policy": "traces-apm.traces-default_policy",
      "next_generation_managed_by": "Index Lifecycle Management",
      "prefer_ilm": false,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": {
        "enabled": false,
        "rollover_on_write": false,
        "indices": []
      }
    }
  ]
}

{
  "data_streams": [
    {
      "name": "traces-apm-default",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000001",
          "index_uuid": "DxvSZVQhQlO-caBMRWQEzQ",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000002",
          "index_uuid": "oOlQOsG6RJ6rXbybz-sPtg",
          "prefer_ilm": true,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000003",
          "index_uuid": "1duuNfCwT6aS_55IEfzMAQ",
          "prefer_ilm": true,
          "managed_by": "Unmanaged"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000004",
          "index_uuid": "wVXW11CzTAavehkx1FksLw",
          "prefer_ilm": false,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        },
        {
          "index_name": ".ds-traces-apm-default-2024.09.02-000005",
          "index_uuid": "UJ8VufpYTgyKa-tXaB4BPA",
          "prefer_ilm": false,
          "ilm_policy": "traces-apm.traces-default_policy",
          "managed_by": "Index Lifecycle Management"
        }
      ],
      "generation": 5,
      "_meta": {
        "package": {
          "name": "apm"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "status": "GREEN",
      "template": "traces-apm@template",
      "ilm_policy": "traces-apm.traces-default_policy",
      "next_generation_managed_by": "Index Lifecycle Management",
      "prefer_ilm": false,
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false,
      "rollover_on_write": false,
      "failure_store": {
        "enabled": false,
        "rollover_on_write": false,
        "indices": []
      }
    }
  ]
}

axw commented 1 week ago

This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?

It's the simplest code change, but I'm not convinced that anyone will go ahead and actively switch their data streams to DSL; I expect we'll end up with users indefinitely sticking with the legacy ILM policies and bifurcating the Serverless & Hosted experience. IMO we should take one of two routes:

  1. Automatically switch data streams with default ILM policy to DSL on upgrade. This should be non-breaking, since the retention periods match, and there's no use of hot/warm/cold in the default ILM policies.
  2. Automatically switch all data streams to DSL on upgrade. This would be breaking when users have customised ILM, so we would need to document how users can re-enable ILM by setting prefer_ilm: true in their @custom component templates.

I prefer (2) since users will need to start setting prefer_ilm: true in their @custom component templates if they want to use ILM. The only other alternative I can see is to revert the use of DSL, but I feel like the cat's already out of the bag with 8.15.0 being released.

simitt commented 1 week ago

Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1

This, as expected, creates unmanaged indices. We would need to suggest workaround for this in our changelog/release-notes. The simplest way would be to have users with default ILM settings explicitly configure DSL for all datastreams. Since default ILM and DSL are identical, there would be no impact to the users. In future, if the user wants to move to ILM, they can do so by creating a custom component template. WDYT @axw ?

@lahsivjar it only creates unmanaged indices in 8.15.0, right? With the fix and changes in 8.15.1, my understanding is that as soon as users upgrade to this version, new, managed backing indices will be created. The 8.15.0 ones would still be unmanaged, but any new ones would not. Can you confirm this?

lahsivjar commented 1 week ago

@lahsivjar it only creates unmanaged indices in 8.15.0, right? With the fix and changes in 8.15.1, my understanding is that as soon as users upgrade to this version, new, managed backing indices will be created. The 8.15.0 ones would still be unmanaged, but any new ones would not. Can you confirm this?

Yes, this is correct. Only indices created with version 8.15.0 will remain unmanaged.

lahsivjar commented 1 week ago

Testing with BC

Create a new 8.15.1+ cluster, with defaults: DLM should be used

✅ Tested ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "T5YO3keARTqvNSbSOKFuJQ", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" } ], "generation": 1, "_meta": { "managed": true, "description": "Index template for traces-apm-*" }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Data stream lifecycle", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Create a new 8.15.1+ cluster, with customised ILM (https://www.elastic.co/guide/en/observability/current/apm-ilm-how-to.html#apm-data-streams-custom-three): customised ILM policy should be used

✅ Tested (requires to set `"prefer_ilm": true` in the `@custom` component template) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "T5YO3keARTqvNSbSOKFuJQ", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "2gyoEUu7TvSdcEQm6NDJUw", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "qRE3mbVORzKdhJ7Iy9TafQ", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "managed": true, "description": "Index template for traces-apm-*" }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "custom-ilmdlm", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": true, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Upgrade 8.14.x to 8.15.1+ with defaults: ILM should continue to be used for old indices, DLM should be used for new indices

⚠️ Tested (Note that new indices would continue to use ILM on upgrade) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "Y4BYJa12S965zieMmkKP2g", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "ksh_hUWmSVCZpBNPdlePvQ", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "G7ujwqE8QiCY-SWxwlSitQ", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Upgrade 8.14.x to 8.15.1+ with customised ILM (same guide as above): ILM should continue to be used for old indices, and also for new indices

✅ Tested ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "kU3XXLL0Riq603vV8rOHwA", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "FPO8lBu4T8W29TcsxfAZ8g", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "OWSm1CIVTBK8txzi4E23_g", "prefer_ilm": false, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "custom-ilmdlm", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Upgrade 8.14.x to 8.15.0 with custom ILM policy: custom ILM policy should work as expected (no bugs for this case)

✅ Tested ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "FF5029kASPiBFiXQIIpHFQ", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "hSYm1-G1QOeqc2DmEyTVGw", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "HURNF7zhTYG_8NQGSRTPRw", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" } ], "generation": 3, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "custom-ilmdlm", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": true, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```
✅ Tested (after upgrading the same cluster from 8.15.0 to 8.15.1, continues to use ILM) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "FF5029kASPiBFiXQIIpHFQ", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "hSYm1-G1QOeqc2DmEyTVGw", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "HURNF7zhTYG_8NQGSRTPRw", "prefer_ilm": true, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000004", "index_uuid": "D7YUM1PdSsSpH-taPns1vw", "prefer_ilm": false, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000005", "index_uuid": "gbSyICRJQl-x0T6Dceq1XA", "prefer_ilm": false, "ilm_policy": "custom-ilmdlm", "managed_by": "Index Lifecycle Management" } ], "generation": 5, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "custom-ilmdlm", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```

Upgrade 8.14.x to 8.15.0 with default ILM policy and then upgrade to 8.15.1

⚠️ Tested (indices created in 8.15.0 remain unmanaged but new ones are managed by ILM) ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "HdvHJejIRM6BepeJpoZP9Q", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "oxcuDgYiQu6uEzwcbjRc0w", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "VIhBWb41RWyObWU06CDwDQ", "prefer_ilm": true, "managed_by": "Unmanaged" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000004", "index_uuid": "WH4nZ_zwQySm5nqyg7fOYQ", "prefer_ilm": true, "managed_by": "Unmanaged" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000005", "index_uuid": "IboaSTnBTtmgYpnCLS-hPw", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" } ], "generation": 5, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Index Lifecycle Management", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```
✅ Tested (on explicitly applying datastream lifecycle config) The config applied is as per what we recommend in [known-issues doc](https://github.com/elastic/observability-docs/pull/4192/files). ```json { "data_streams": [ { "name": "traces-apm-default", "timestamp_field": { "name": "@timestamp" }, "indices": [ { "index_name": ".ds-traces-apm-default-2024.09.03-000001", "index_uuid": "HdvHJejIRM6BepeJpoZP9Q", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000002", "index_uuid": "oxcuDgYiQu6uEzwcbjRc0w", "prefer_ilm": true, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Index Lifecycle Management" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000003", "index_uuid": "VIhBWb41RWyObWU06CDwDQ", "prefer_ilm": true, "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000004", "index_uuid": "WH4nZ_zwQySm5nqyg7fOYQ", "prefer_ilm": true, "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000005", "index_uuid": "IboaSTnBTtmgYpnCLS-hPw", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" }, { "index_name": ".ds-traces-apm-default-2024.09.03-000006", "index_uuid": "YAC31m-rTzudlDtiD3LBPg", "prefer_ilm": false, "ilm_policy": "traces-apm.traces-default_policy", "managed_by": "Data stream lifecycle" } ], "generation": 6, "_meta": { "package": { "name": "apm" }, "managed_by": "fleet", "managed": true }, "status": "YELLOW", "template": "traces-apm@template", "lifecycle": { "enabled": true, "data_retention": "10d" }, "ilm_policy": "traces-apm.traces-default_policy", "next_generation_managed_by": "Data stream lifecycle", "prefer_ilm": false, "hidden": false, "system": false, "allow_custom_routing": false, "replicated": false, "rollover_on_write": false } ] } ```