Open andreidan opened 2 years ago
Pinging @elastic/es-data-management (Team:Data Management)
As discussed with @andreidan , it will be beneficial to list of problematic indices in the error message as well. The error message only the [data_frozen] tier preference may be used for partial searchable snapshots (got: [data_content])]
is too vague.
If anyone visit this issue you can find all the problematic indices using below jq with settings.json file from diag:
jq -r 'keys[] as $k | "($k) (.[$k] | .settings.index.routing.allocation.include._tier_preference)"' settings.json | awk '{printf("%-40s\t%-20s\n",$1,$2)}'
Alternative discovery JQ
# tier preference set and not to frozen
$ cat settings.json | jq 'to_entries[]|.key as $k| .value.settings.index| select($k|contains("partial-"))| select(.routing.allocation.include."_tier_preference"|contains("data_frozen")|not)|$k'
# more generic check to include if tier preference NULL
$ cat settings.json | jq -rc 'to_entries[]|.key as $k| .value.settings.index| select($k|contains("partial-"))| {i:$k, tp:.routing.allocation.include."_tier_preference"}' | grep -v "data_frozen" | count
Elasticsearch Version
7.x, 8.x
Installed Plugins
No response
Java Version
bundled
OS Version
Darwin
Problem Description
We sometimes see indices with settings that have invalid values. eg partially mounted indices that can only live in the frozen tier, somehow end up having a
_tier_preference
that's notdata_frozen
The bigger problem is that updating the
_tier_preference
setting to a correct value is not possible in this case. We see stack traces in the form of:The problem here is that
MetadataUpdateSettingsService
calls intoIndicesService.verifyIndexMetadata
which in turn attempts to create anIndexService
using the current, invalid,IndexMetadata
https://github.com/elastic/elasticsearch/blob/master/server%2Fsrc%2Fmain%2Fjava%2Forg%2Felasticsearch%2Findices%2FIndicesService.java#L794Currently we have two solutions:
DELETE
call and then mount them again from snapshot ORdata_frozen
value ( code )Steps to Reproduce
We're not sure yet how to get into this situation.
Logs (if relevant)
No response