Closed pheyos closed 1 year ago
Pinging @elastic/es-distributed (Team:Distributed)
Things have definitely changed in this area in 8.6.0 in ways that might make this more common, but I think this could have happened in earlier versions too: we might restore a shard onto instance-0000000001
but while that's happening we have always been free to decide to rebalance it onto instance-0000000000
and instance-0000000002
, removing the copy on instance-0000000001
and leaving only the two copies with recovery type peer
in the recovery API output.
You cannot rely on seeing a recovery with type snapshot
in these situations, although in practice it might well have been present much of the time in earlier versions. Can you help us understand why you would need to see this? If you're waiting for the recovery to complete, it should be enough to wait for the index health to be green
.
Thanks for providing the details about how it comes to that situation @DaveCTurner!
in practice it might well have been present much of the time in earlier versions
Indeed. We went with this approach for more than two years now with multiple runs per day and it worked fine.
You cannot rely on seeing a recovery with type
snapshot
in these situations
With your explanation, I understand how this is happening. But from a UX perspective I think it's not ideal and actually looks like a bug.
The _recovery
docs say Use the index recovery API to get information about ongoing and completed shard recoveries.
and for the response body type
:
SNAPSHOT
A snapshot. Indicates recovery is related to a snapshot restore operation.
So a user would expect to see this type for snapshot restore operations. But with the process you described and the result of no snapshot
entry left, the information that this was coming from a snapshot restore is completely lost.
Can you help us understand why you would need to see this?
It's true that we don't necessarily need this and there are other ways to do it. It's currently implemented that way because
so there was no reason to doubt the approach.
If you're saying that this is behaving as intended and you don't plan to change that, I'd suggest to update the documentation to make it clear that a snapshot restore operation doesn't necessarily leave a snapshot
recovery entry.
If you're saying that this is behaving as intended and you don't plan to change that, I'd suggest to update the documentation to make it clear that a snapshot restore operation doesn't necessarily leave a snapshot recovery entry.
Yes I think these docs are lacking and can be improved as you suggest. See https://github.com/elastic/elasticsearch/pull/91861.
See also https://github.com/elastic/elasticsearch/issues/60747 which would let you see older recoveries too.
With the incoming docs update and a potential future ability to see older recoveries, I'm closing this issue. Thanks for your quick responses @DaveCTurner!
Found in version
Steps to reproduce
"number_of_shards": "2"
GET INDEX_NAME/_recovery
orGET _cat/recovery/INDEX_NAME?v
Expected result
snapshot
and one of typepeer
, e.g. like this:Actual result
peer
and nosnapshot
entry:Additional information
snapshot
recovery entry (e.g. happens with the ML QA framework)