Closed arm4b closed 3 years ago
:(
Further to this, the etc-operator project has just been archived and is now in maintenance mode. Not good to have unfinished components slowly rot over time. Can we drop the etcd-operator dependency somehow?
I wrote something here earlier https://forum.stackstorm.com/t/etcd-operator-project-archived/1140
@danielburrell Thanks for letting us know about etc-operator archived state.
Yes, the next step would be identifying another coordination backend that has good helm charts and works well. Best bet is Redis for now, but it also could be Memcached or other alternatives from https://docs.openstack.org/tooz/latest/user/drivers.html
After seeing the following in logs when cluster couldn't start itself or even start clean if all
etcd
pods were killed:This situation is not recovered by
etcd-operator
. https://github.com/coreos/etcd-operator/blob/8347d27afa18b6c76d4a8bb85ad56a2e60927018/pkg/cluster/cluster.go#L248-L252Researching further looks like there are quite a lot of cases when
etcd-operator
can't recover itself:Because this backend is needed just for short-lived coordination locks, consider switching to
Redis
or even single-instanceetcd
like it was before (https://github.com/StackStorm/stackstorm-ha/pull/52)?