Open BenB196 opened 2 years ago
@BenB196 thanks a lot for reporting this and providing very detailed instructions. We need to think about how we can make this whole process much simpler.
I have PVs and had a situation were the volumes were not able to be re-attached. The error message from describing the pod was "Multi-attach error for volume "pvc-####-####" Volume is already exclusively attached to one node". Would having an NFS Share be a workaround? I know that having local disk is recommended but wanted to ask for the possibility
@mdf-ido how are you're PVs provisioned? If you're using something like Longhorn, look into https://longhorn.io/kb/troubleshooting-volume-with-multipath/ I use something similar for Dev/Stage clusters for easier maintenance, and have run into the issue with mutlipath in the past locking volumes.
Edit:
Btw, I don't think that issue is directly related to ECK, it sounds more like a Kubernetes/Host issue, than an ECK one.
Hi Ben! Thanks for the quick reply I am using AKS and the PVs are provisioned dynamically with the Azure Built-in storage classes.
@BenB196 @sebgl Anybody know if there were ever improvements made to this process? We are running into a similar issue that will likely put an end to the possibility of us upgrading to an Enterprise license.
We allowed the Operator to perform a rolling restart following our upgrade of the Operator to version 2.6.1 - for a roughly 80 data node deployment with around 1.5 TB of disk usage per node the restart took over 40 hours and significantly impacted user latencies.
This isn't something that causes issue in smaller volume environments or clusters. In clusters that have non-negligible data size we had terabytes of I/O from shards (both primary and replica) being moved all over the cluster. Ideally the cluster simply should promote a replica to primary and wait for the restarted node to come back online as the data is still available on it's PVC. We tried to manually set both the persistent and transient allocation/rebalance setting but the Operator overrides it immediately with a transient allocation setting of 'all'.
Would love to get more information about how to improve this process.
Any updates on finalizing this documentation?
Proposal
Currently if you manage an ECK cluster, and this cluster has persistent volumes, when you need to do a rolling restart on the underlying host nodes, the restart process is not straight forward and not well documented.
There are several major gaps in the docs that can result in a user having a bad time if they are skipped.
To best explain the issue, I'll write out the steps actually required to do something like this.
Assumptions:
Steps: Note: These steps should be done for every host restart
Disable shared allocation
Exclude Elasticsearch cluster from ECK Operator management
Remove transient cluster setting
"transient.cluster.routing.allocation.exclude._name" : "none_excluded"
"persistent.cluster.routing.allocation.enable": "primaries"
does not work as intended as shards will reallocate.You can follow the (
Optional
) steps in the Elasticsearch rolling upgrade guideDrain your host node
Restart host node
Wait for host node to come back online
Uncordon node
Wait for Elasticsearch nodes to recover
Remove
"persistent.cluster.routing.allocation.enable": "primaries"
Reinclude Elasticsearch cluster into ECK operator management
Wait for cluster to recover.
Repeat steps per host node.