kubectl delete redisenterprisedatabases stuck

soroshsabz commented 2 years ago

ITNOA

My cluster is in bad state, as you can see in blow

ssoroosh@master:~$ kubectl get pods
NAME                                              READY   STATUS    RESTARTS        AGE
harbor-cluster-0                                  1/2     Running   26 (22s ago)    150m
harbor-cluster-services-rigger-6dcc59d7d8-p6hvn   1/1     Running   4 (137m ago)    24h
redis-enterprise-operator-7f8d8548c5-bj447        2/2     Running   26 (144m ago)   6d20h

and I have one database like below

ssoroosh@master:~$ kubectl get redisenterprisedatabases.app.redislabs.com
NAME       VERSION   PORT    CLUSTER          SHARDS   STATUS   SPEC STATUS   AGE
harbordb   6.0.13    14095   harbor-cluster   1        active   Valid         23h

So I want delete my database and cluster and recreate it to correct my state, but my problem is when I try commanding to delete the database, the command stuck and does not complete for ever

ssoroosh@master:~$ kubectl delete redisenterprisedatabases.app.redislabs.com harbordb
redisenterprisedatabase.app.redislabs.com "harbordb" deleted

I log my redis operator and see below command,

{"level":"info","ts":"2022-02-10T20:21:04.303Z","logger":"controller_redisenterprisecluster","msg":"Cannot delete REC:","Request.Namespace":"default","Request.Name":"harbor-cluster","finalizers":["redbfinalizer.redisenterpriseclusters.app.redislabs.com","stsfinalizer.redisenterprisecluster.app.redislabs.com","recpodfinalizer.redisenterprisecluster.app.redislabs.com"]}
{"level":"error","ts":"2022-02-10T20:21:08.571Z","logger":"controller_redisenterprisedatabase","msg":"failed to observe database state","Namespace":"default","Name":"harbordb","error":"could not get cluster object from RedisEnterpriseCluster: Get \"https://harbor-cluster:9443/v1/cluster\": dial tcp: lookup harbor-cluster on 10.96.0.10:53: no such host","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.3/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2022-02-10T20:21:09.232Z","logger":"controller_redisenterprisecluster","msg":"Cannot delete REC:","Request.Namespace":"default","Request.Name":"harbor-cluster","finalizers":["redbfinalizer.redisenterpriseclusters.app.redislabs.com","stsfinalizer.redisenterprisecluster.app.redislabs.com","recpodfinalizer.redisenterprisecluster.app.redislabs.com"]}
{"level":"error","ts":"2022-02-10T20:21:13.390Z","logger":"controller_redisenterprisedatabase","msg":"failed to observe database state","Namespace":"default","Name":"harbordb","error":"could not get cluster object from RedisEnterpriseCluster: Get \"https://harbor-cluster:9443/v1/cluster\": dial tcp: lookup harbor-cluster on 10.96.0.10:53: no such host","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.3/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":"2022-02-10T20:21:14.162Z","logger":"controller_redisenterprisecluster","msg":"Cannot delete REC:","Request.Namespace":"default","Request.Name":"harbor-cluster","finalizers":["redbfinalizer.redisenterpriseclusters.app.redislabs.com","stsfinalizer.redisenterprisecluster.app.redislabs.com","recpodfinalizer.redisenterprisecluster.app.redislabs.com"]}

How to remove database and cluster forcely and exit from stuck? (I try --force, but my problem does not resolved)

thanks

laurentdroin commented 2 years ago

Hi @soroshsabz

The REDB probably still has finalizers that are not being removed as everything is in a bad state. You can remove the finalizers on the REDB manually. See https://github.com/RedisLabs/redis-enterprise-k8s-docs/blob/master/topics.md#rec-deletion for more details.

Regarding the "bad state", it is important to understand that at any given time, you need to have the majority of the nodes up and running. In a 3 nodes Redis Enterprise cluster, you cannot have more than one node down at any given time. From what I can see here, I think it is likely that your 3 pods were down at the same time. And the first pod that is coming back up is not able to find any cluster to join. When you are in this situation, you need to "recover" the cluster, following this procedure: https://docs.redis.com/latest/kubernetes/re-clusters/cluster-recovery/

I also recommend working with Redis and opening a Support ticket with us. In most cases, we will request the logs archive generated by a run of the log_collector.py script (https://docs.redis.com/latest/kubernetes/logs/collect-logs/).

I hope this is helpful.

Laurent.

soroshsabz commented 2 years ago

Your description is very very helpful,

thanks for replying :)

RedisLabs / redis-enterprise-k8s-docs

kubectl delete redisenterprisedatabases stuck #214