Open DaveCTurner opened 1 year ago
Pinging @elastic/es-data-management (Team:Data Management)
This is a bit of a tricky one, we might be closing an index with unassigned shards because we want to recover it from a snapshot, so we can't just reject close requests until the index is healthy.
Henning and I discussed this further. We identified three specific uses for the close index API today, and think we see better alternatives in all cases:
IndexMetadata
updatesUsers will close an index in order to adjust the index metadata in a way that we do not (or cannot) support on an open index. Notably this includes changing non-dynamic index settings, but possibly there are other reasons for an index metadata update to require us to restart the whole IndexService
and all the shards.
In this case we think there's no particularly good reason to close the index as a separate step before adjusting the metadata. We believe it should work to adjust the metadata update and mark all the shards as UNASSIGNED
in the routing table as a single cluster state update. We think we could add a ?reopen=true
option to the update-index-settings API (and any other relevant APIs) to allow users to indicate that they want the metadata update to also restart the index, much as if closing it and reopening it except without the extra API calls or tricky intermediate states.
Note that the close-adjust-reopen pattern today allows a sequence of multiple adjustments to happen in between the close and reopen steps, and that would not be possible with a one-shot call to the update-index-settings API with ?reopen=true
. For index settings this should be ok, but if there are other metadata adjustments to support then we would need an API that allows to combine them all into a single transaction.
Users will also close an index in order to overwrite it with a copy from a snapshot. The restore process is triggered by a metadata update which marks the closed index as open and simultaneously marks all its shards as UNASSIGNED
in the routing table as a single cluster state update, adjusting the recovery source of all the primaries to refer to the snapshot.
In this case we see a little UX value in closing the index first: it prevents users from accidentally overwriting existing indices with an overly-inclusive wildcard in the restore pattern. However there's no fundamental reason to require the index to be closed first, and we think it would work to prevent accidental overwrites by requiring an explicit list of indices/patterns to overwrite in the restore request instead of specifying them in earlier close-index requests.
We must reinitialize everything from cold when converting a CCR follower index into a regular read-write index via the /_ccr/unfollow
API. Today we achieve this by requiring the index to be closed when calling this API, but effectively this is a special case of the non-dynamic IndexMetadata
update mentioned above and again we think we could add a ?reopen=true
option to the API to do the update and reinitialization in a single step.
Once we have implemented alternatives to all the known use-cases for the close index API, we would be able to deprecate it and remove it in some future major release.
[^1]: Edited to add this case 2024-01-09.
Copied from this comment: if an index has unassigned shards then we cannot complete the coordinated flush and block that happens before closing the index, leaving these shards with translog to replay if/when they recover:
This reproduces fairly readily as follows:
This is a bit of a tricky one, we might be closing an index with unassigned shards because we want to recover it from a snapshot, so we can't just reject close requests until the index is healthy.