elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 392 forks source link

Reindexing steps for TSDB enabled data streams for conflicting fields #8085

Open aliabbas-elastic opened 9 months ago

aliabbas-elastic commented 9 months ago

Main Issue

Reindexing steps document

Related issues

Description

This issue provides the detailed reindexing steps for TSDB enabled data streams that need to be followed when there are field conflicts have been found because of the mismatched datatype.

For example, let's say host.ip field is shown conflicted under metrics-* data view, then this issue can be solved by reindexing the particular data stream's indices.

To reindex the data, the following steps must be performed.

Step 1 Stop the data stream by going to Integrations -> <integration_name> -> Integration policies open the configuration of integration and disable the impacted data stream and save the integration.

Step 2 Copy data into the temporary index by performing the following steps in the Dev tools.

POST _reindex
{
  "source": {
    "index": "<index_name>"
  },
  "dest": {
    "index": "temp_index"
  }
}  

Example:

POST _reindex
{
  "source": {
    "index": "metrics-dummy.cluster-default"
  },
  "dest": {
    "index": "temp_index"
  }
}

Step 3 Note down the following values from the backing indices and index template of the data stream to be re indexed.

Step 4 Create the index template after setting all the parameters mentioned in Step 3. (Here we will create a clone template hence the name metrics-dummy.cluster-copy)

POST _index_template/metrics-dummy.cluster-copy
{
  "index_patterns": ["metrics-dummy.cluster-*"],
  "template": {
    "settings": {
      "index": {
        "number_of_shards" : 2,
        "number_of_replicas": 0,
        "mode": "time_series",
        "codec": "best_compression",
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
….
….
….
….
                    "count": {
                      "type": "long",
                      "time_series_metric": "counter"
                    }
                  }
                },
                "uptime": {
                  "properties": {
                    "sec": {
                      "type": "long",
                      "time_series_metric": "gauge"
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "aliases": {}
  }
}

Step 5 Now navigate to the created index template Stack Management -> Index Management -> Index Templates and click on the Manage-> Edit. Under Logistics, enable the Create data stream and set the priority to 300 (it should be greater than that of the metrics-dummy.cluster-default index template).

Step 6 Delete the existing data stream by performing the following steps in the Dev tools.

DELETE /_data_stream/<data_stream>

Example:

DELETE /_data_stream/metrics-dummy.cluster-default

Step 7 Copy data from the temporary index to the new index by performing the following steps in the Dev tools.

POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "index": "temp_index"
  },
  "dest": {
    "index": "<index_name>",
    "op_type": "create"
  }
}

Example:

POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "index": "temp_index"
  },
  "dest": {
    "index": "metrics-dummy.cluster-default",
    "op_type": "create"
  }
}

Step 8 Verify data is reindexed completely and the conflicts are resolved.

Step 9 Now navigate to the created index template Stack Management -> Index Management -> Index Templates and click on the Manage-> Edit. Under Logistics, unset the priority which was set in Step 5.

Step 10 Invoke the rollover api on the destination data stream without any conditions set.

POST /<data_stream>/_rollover

Example:

POST /metrics-dummy.cluster-default/_rollover

Step 11 Delete temporary index and index template by performing the following step in the Dev tools.

DELETE temp_index
DELETE metrics-dummy.cluster-copy

Step 12 Start the data stream by going to the Integrations -> <integration_name> -> Integration policies and open configuration of integration and enable the Collect <integration_name> metrics toggle and save the integration.

alaudazzi commented 6 months ago

@aliabbas-elastic Where should this content go? It looks like it's quite generic and can be referenced by various integrations.

aliabbas-elastic commented 6 months ago

@alaudazzi Yes this is quite generic content related to reindexing a particular data stream when there are conflicts in particular fields. Ideally this should be in a generic place from where we can add that link in all of our integrations README.

alaudazzi commented 6 months ago

@aliabbas-elastic and I met and discussed the next steps to fix the docs, which would impact both ES and Integrations doc pages:

  1. We have two different step-by-step procedures:
    • TSDB enabled data streams (#8085)
    • NON TSDB enabled data streams (#7624)
  2. These two procedures can be integrated into the current Reindex API page, in the Elasticsearch doc set
  3. From the Integrations doc pages, we can reference these two procedures where appropriate.
  4. We can try to get the list of integrations that are TSDB or NON-TSDB enabled from @agithomas or the

    tsdb-observability team

@lalit-satapathy @SubhrataK These steps are quite complex and might not be appropriate for a regular user. Unless there is an easier way to provide this information, these steps should go into the public doc. Please check if the suggested approach is OK, or if you think there is any better alternative.

lalit-satapathy commented 6 months ago

These two procedures can be integrated into the current Reindex API page, in the Elasticsearch doc set

It will be nice. I think sometime back we decided to move all these reindexing documentation from inside the integration document and point to relevant elasticsearch document instead. I hope those changes are done @agithomas . I agree to any further simplification of these docs, but we may have some differences of re-indexing in TSDB index vs non-TSDB index.

agithomas commented 5 months ago

The related PR from ES team to update the document is here.

This must be the related page

alaudazzi commented 5 months ago

Thanks for the additional background @agithomas! Please check if the following actions make sense:

  1. [ES team] Add these procedures to the ES docs:

    • TSDB enabled data streams (#8085)
    • NON-TSDB enabled data streams (#7624)
  2. [Arianna] As we try to avoid 4 levels in the nav tree, put these two sections after the page Reindex a TSDS:

    Image

    1. [AgiT] Provide the list of integrations that are TSDB or NON-TSDB enabled from where these two procedures should be referenced.
  3. [Arianna] Create the links from Integrations to ES docs

agithomas commented 5 months ago

Provide the list of integrations that are TSDB or NON-TSDB enabled from where these two procedures should be referenced.

Shared the details separately