Open henningandersen opened 5 years ago
Pinging @elastic/es-distributed
I made some updates to the meta issues under coordinator node.
Should we consider defaulting wait_for_completion
to false
as a breaking change?
It's a bit trappy in the sense that a client could disconnect and leave the reindex in an unknown state.
TCP disconnection could potentially also be considered a way to cancel to the reindex operation when wait_for_completion=true
OOTB
@Mpdreamz we discussed this in our weekly sync today. It seems both defaults have benefits. The current default gives the easiest OOTB experience for someone new to ES when playing around with it.
Also, we think we need a strong argument for changing the default, to ensure that we only do breaking changes when necessary. Do you have a good case to present on this?
Notice that part of this project intends to introduce reindex as jobs and thus a disconnected client would leave the job in a healthy state though finding the job again will require looking for it through the new reindex job API (probably something like GET _reindex/
or GET _reindex/<job-id>
.
For search, cancelling the job on TCP disconnect makes sense, since the result is going nowhere anywhere and a search has no sideeffects. For reindex, the job has sideeffects as its primary purpose and whether or not the user wants to cancel is less obvious. We think being explicit is better, also to ensure that a network issue does not result in stopping the job.
We want to make reindex resilient to node restarts and failures, such that reindex can continue to run across such events.
There are two primary problems to solve:
Search resiliency
Coordinator node resiliency:
indices:data/write/start_reindex
indices:admin/reindex/start_reindex
cluster:admin/reindex/start_reindex
indices:data/reindex/start_reindex
Slicing:
Benchmarking:
Misc:
Docs