After that I've created EtcdRestore CR and restore-operator begins the restore operation.
During that operation, restore-operator tries to cleanup pods/service from initial etcd cluster and etcd-operator itself should recreate them, but it fails to recreate ClientService service called "clusterName-client", because it still exists at that moment.
etcd-operator code tries to create service only once and ignores "IsAlreadyExists" error, so it silently pass on it.
https://github.com/coreos/etcd-operator/blob/master/pkg/util/k8sutil/k8sutil.go#L189
There is no more DeleteCollection operation on services and we've two of them "clusterName" and "clusterName-client", but code tries to delete only one service called by "clusterName".
In case of pods it works fine, cause pods has unique names and we can create new pods with same prefix.
This works fine on Kubernetes 1.13 and fails on Kubernetes 1.14, I didn't find what have changed in delete operations in Kubernetes between those releases.
What you expected to happen:
Both etcd k8s services are deployed after restore operation.
How to reproduce it (as minimally and precisely as possible):
Deploy etcd-operator from official helm chart on Kubernetes 1.14.2, make a backup using backup-operator to S3, redeploy etcd-operator and create EtcdRestore CR pointing to backup on S3.
Anything else we need to know?:
etcd-operator version: v0.9.4
Restore operation works fine on Kubernetes 1.13.6
Environment:
Kubernetes version (use kubectl version):
1.14.2
Cloud provider or hardware configuration:
AWS
OS (e.g: cat /etc/os-release):
CoreOS 2079.4.0
Kernel (e.g. uname -a):
4.19.43
Install tools:
Network plugin and version (if this is a network-related bug):
flannel 0.11.0
name: Bug Report labels: kind/bug
What happened:
I'm trying to restore etcd from S3 backup made by backup-operator. I'm deploying fresh empty etcd cluster using etcd-operator helm chart - https://github.com/helm/charts/tree/master/stable/etcd-operator
After that I've created EtcdRestore CR and restore-operator begins the restore operation. During that operation, restore-operator tries to cleanup pods/service from initial etcd cluster and etcd-operator itself should recreate them, but it fails to recreate ClientService service called "clusterName-client", because it still exists at that moment. etcd-operator code tries to create service only once and ignores "IsAlreadyExists" error, so it silently pass on it. https://github.com/coreos/etcd-operator/blob/master/pkg/util/k8sutil/k8sutil.go#L189
https://github.com/coreos/etcd-operator/commit/be0c3acb50a902acd73960bd61221e80f50bdcb6#diff-46acd69e36758f5f5c27664b895b2bc3
There is no more DeleteCollection operation on services and we've two of them "clusterName" and "clusterName-client", but code tries to delete only one service called by "clusterName". In case of pods it works fine, cause pods has unique names and we can create new pods with same prefix. This works fine on Kubernetes 1.13 and fails on Kubernetes 1.14, I didn't find what have changed in delete operations in Kubernetes between those releases.
What you expected to happen:
Both etcd k8s services are deployed after restore operation.
How to reproduce it (as minimally and precisely as possible):
Deploy etcd-operator from official helm chart on Kubernetes 1.14.2, make a backup using backup-operator to S3, redeploy etcd-operator and create EtcdRestore CR pointing to backup on S3.
Anything else we need to know?:
etcd-operator version: v0.9.4
Restore operation works fine on Kubernetes 1.13.6
Environment:
kubectl version
): 1.14.2cat /etc/os-release
): CoreOS 2079.4.0uname -a
): 4.19.43