elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.1k stars 24.83k forks source link

Cannot update settings for index from an invalid to a valid state #84758

Open andreidan opened 2 years ago

andreidan commented 2 years ago

Elasticsearch Version

7.x, 8.x

Installed Plugins

No response

Java Version

bundled

OS Version

Darwin

Problem Description

We sometimes see indices with settings that have invalid values. eg partially mounted indices that can only live in the frozen tier, somehow end up having a _tier_preference that's not data_frozen

only the [data_frozen] tier preference may be used for partial searchable snapshots (got: [data_content])]

The bigger problem is that updating the _tier_preference setting to a correct value is not possible in this case. We see stack traces in the form of:

at org.elasticsearch.cluster.routing.allocation.DataTier$DataTierSettingValidator.validate(DataTier.java:302)
    at org.elasticsearch.cluster.routing.allocation.DataTier$DataTierSettingValidator.validate(DataTier.java:269)
    at org.elasticsearch.common.settings.Setting.get(Setting.java:515)
    at org.elasticsearch.common.settings.Setting.get(Setting.java:485)
    at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:598)
    at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:507)
    at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:477)
    at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:696)
    at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:810)
    at org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService$1.execute(MetadataUpdateSettingsService.java:271)
    at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:51)
    at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:836)
    at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:403)
    at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:243)
    at org.elasticsearch.cluster.service.MasterService.access$100(MasterService.java:63)
    at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:170)
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:146)
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:202)

The problem here is that MetadataUpdateSettingsService calls into IndicesService.verifyIndexMetadata which in turn attempts to create an IndexService using the current, invalid, IndexMetadata https://github.com/elastic/elasticsearch/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Felasticsearch%2Findices%2FIndicesService.java#L794

Currently we have two solutions:

  1. delete the problematic indices in one DELETE call and then mount them again from snapshot OR
  2. upgrade to a higher version of Elasticsearch as we automatically update the tier preference for partially mounted indices to the correct data_frozen value ( code )

Steps to Reproduce

We're not sure yet how to get into this situation.

Logs (if relevant)

No response

elasticmachine commented 2 years ago

Pinging @elastic/es-data-management (Team:Data Management)

romain-chanu commented 2 years ago

As discussed with @andreidan , it will be beneficial to list of problematic indices in the error message as well. The error message only the [data_frozen] tier preference may be used for partial searchable snapshots (got: [data_content])] is too vague.

gadekishore commented 2 years ago

If anyone visit this issue you can find all the problematic indices using below jq with settings.json file from diag:

jq -r 'keys[] as $k | "($k) (.[$k] | .settings.index.routing.allocation.include._tier_preference)"' settings.json | awk '{printf("%-40s\t%-20s\n",$1,$2)}'

stefnestor commented 1 year ago

Alternative discovery JQ

# tier preference set and not to frozen
$ cat settings.json | jq 'to_entries[]|.key as $k| .value.settings.index| select($k|contains("partial-"))| select(.routing.allocation.include."_tier_preference"|contains("data_frozen")|not)|$k'

# more generic check to include if tier preference NULL
$ cat settings.json | jq -rc 'to_entries[]|.key as $k| .value.settings.index| select($k|contains("partial-"))| {i:$k, tp:.routing.allocation.include."_tier_preference"}' | grep -v "data_frozen" | count