Open matschaffer opened 3 years ago
Pinging @elastic/es-distributed (Team:Distributed)
We (the @elastic/es-distributed team) discussed possible solutions in our team meeting today. Our favourite idea was to introduce a new option that would let you preserve the aliases of an existing index rather than overwriting them or clearing them as we do today. The reasoning was that when restoring an index like this you're really trying to put its data back without changing its place in the cluster, so the aliases of the existing index are likely more useful than the aliases in the snapshot.
We discussed changing the default behaviour but decided it'd be surprising for the API to behave differently from today by default. Instead we would expect tooling that restores indices like this to use this new option explicitly.
We also discussed whether to preserve any other metadata (mappings, settings, ...) rather than overwriting them from those in the snapshot but decided that there are too many ways that such a mechanism might lead to operational surprises.
How does that sound @matschaffer?
Hard to say without a little more detail.
My expectation would be that you have some ability to restore matschaffer-filebeat-7.7.1-2021.03.21-000095
with only the read alias, leaving the write alias pointed to matschaffer-filebeat-7.7.1-2021.03.21-000096
. In contrast to today where you get either read+write or nothing (via include_aliases: false
).
If the new option would do this, then that's probably fine. It'd be good if we make this the default in Kibana's restore UI, or maybe even in elasticsearch itself.
We see this with some frequency when orchestrating snapshot restore after VM failure on non-HA indices.
On closer inspection it seems that include_aliases: false
already does what we propose, preserving the aliases of the existing closed index over the top of which we're doing the restore, but the orchestration tooling isn't setting this option so its restores will often fail as described. I believe we should always use include_aliases: false
when restoring an index to recover it from some misadventure that left it in red
health.
cc @elastic/cloud-orchestration for comment/prioritization
I don't have a strong understanding of all the implications here, but if the recommendation from ES is to just set include_aliases: false
on all snapshot restores (no conditional logic) then we can do that very easily. cc @anyasabo
Yep +1 here, though dave your wording here has me a little concerned.
I believe we should always use include_aliases: false when restoring an index to recover it from some misadventure that left it in red health.
Should we just always be setting include_aliases: false
?
one additional thing , that happens to us after snapshot restore. By default , it will restore the ILM policy , which means that ILM usually kicks in and removes the restored index , shortly after restore has completed , which is very annoying.
We opened a support case on this and we pretty arrived at the conclusion , that the snapshot web interface cant be used and we have since then used dev tools for this , which is kinda sad.
I've seen some cases where a snapshot restore has failed with an error like this:
The sequence of events is roughly:
matschaffer-filebeat-7.7.1-2021.03.21-000095
viamatschaffer-filebeat-7.7.1
write aliasmatschaffer-filebeat-7.7.1-2021.03.21-000095
with the alias informationmatschaffer-filebeat-7.7.1-2021.03.21-000095
tomatschaffer-filebeat-7.7.1-2021.03.21-000096
and updates the write aliasmatschaffer-filebeat-7.7.1-2021.03.21-000095
is lostmatschaffer-filebeat-7.7.1-2021.03.21-000095
fails because it attempts to also use thematschaffer-filebeat-7.7.1
write index, currently backed bymatschaffer-filebeat-7.7.1-2021.03.21-000096
To work around this I had to perform the restore manually without aliases:
Then replace the read alias so the restored data would be available via normal query load:
It'd be great if restore could be more ILM-aware such that it won't try to re-claim write indices already backed by a more-current index.