apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.08k stars 170 forks source link

[BUG] patroni postgresql do stop and start Failed #2572

Closed JashBook closed 1 year ago

JashBook commented 1 year ago

Describe the bug patroni postgresql do stop and start Failed.

kbcli version                                                   
Kubernetes: v1.25.6-eks-48e63af
KubeBlocks: 0.5.0-alpha.7
kbcli: 0.5.0-alpha.7

To Reproduce Steps to reproduce the behavior:

  1. install kubeblocks
  2. create pg cluster
    kbcli cluster create test-cluster --termination-policy=WipeOut --cluster-definition=postgresql --set cpu=100m,memory=500Mi,replicas=2,storage=1Gi --namespace default
  3. stop
    kbcli cluster stop test-cluster
  4. start
    kbcli cluster start test-cluster
  5. See error
    
    kubectl get pod,ops,sts -l app.kubernetes.io/instance=test-cluster
    NAME                              READY   STATUS    RESTARTS   AGE
    pod/test-cluster-postgresql-0     3/4     Running   0          14m
    pod/test-cluster-postgresql-1-0   3/4     Running   0          14m

NAME TYPE CLUSTER STATUS PROGRESS AGE opsrequest.apps.kubeblocks.io/test-cluster-start-rrc6p Start test-cluster Failed 2/2 14m opsrequest.apps.kubeblocks.io/test-cluster-stop-zlzhv Stop test-cluster Succeed 2/2 14m

NAME READY AGE statefulset.apps/test-cluster-postgresql 0/1 14m statefulset.apps/test-cluster-postgresql-1 0/1 14m

6.  describe cluster 

kbcli cluster describe test-cluster Name: test-cluster Created Time: Apr 13,2023 18:33 UTC+0800 NAMESPACE CLUSTER-DEFINITION VERSION STATUS TERMINATION-POLICY
default postgresql postgresql-15.2.0 Failed WipeOut

Endpoints: COMPONENT MODE INTERNAL EXTERNAL
postgresql ReadWrite test-cluster-postgresql.default.svc.cluster.local:5432
test-cluster-postgresql.default.svc.cluster.local:9187

Topology: COMPONENT INSTANCE ROLE STATUS AZ NODE CREATED-TIME
postgresql test-cluster-postgresql-0 primary Running cn-northwest-1a ip-172-31-13-48.cn-northwest-1.compute.internal/172.31.13.48 Apr 13,2023 18:36 UTC+0800
postgresql test-cluster-postgresql-1-0 secondary Running cn-northwest-1c ip-172-31-44-8.cn-northwest-1.compute.internal/172.31.44.8 Apr 13,2023 18:37 UTC+0800

Resources Allocation: COMPONENT DEDICATED CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE-SIZE STORAGE-CLASS
postgresql false 100m / 100m 500Mi / 500Mi data:1Gi ebs-sc

Images: COMPONENT TYPE IMAGE
postgresql postgresql registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:15.2.0

Events(last 5 warnings, see more:kbcli cluster list-events -n default test-cluster): TIME TYPE REASON OBJECT MESSAGE
Apr 13,2023 18:36 UTC+0800 Warning ApplyResourcesFailed Cluster/test-cluster Operation cannot be fulfilled on statefulsets.apps "test-cluster-postgresql-1": StorageError: invalid object, Code: 4, Key: /registry/statefulsets/default/test-cluster-postgresql-1, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 138d0e95-794e-44a6-b9d6-8c8fd01e9b1c, UID in object meta:
Apr 13,2023 18:37 UTC+0800 Warning Unhealthy Cluster/test-cluster Pod test-cluster-postgresql-0: Readiness probe failed: 127.0.0.1:5432 - no response

Apr 13,2023 18:38 UTC+0800 Warning Unhealthy Cluster/test-cluster Pod test-cluster-postgresql-1-0: Readiness probe failed: 127.0.0.1:5432 - no response

Apr 13,2023 18:47 UTC+0800 Warning Unhealthy Instance/test-cluster-postgresql-0 Readiness probe failed: 127.0.0.1:5432 - no response

Apr 13,2023 18:47 UTC+0800 Warning Unhealthy Instance/test-cluster-postgresql-1-0 Readiness probe failed: 127.0.0.1:5432 - no response

7.  logs pod

➜ ~ kubectl logs test-cluster-postgresql-0 Defaulted container "postgresql" out of: postgresql, metrics, kb-checkrole, config-manager, pg-init-container (init)

➜ ~ kubectl logs test-cluster-postgresql-1-0 Defaulted container "postgresql" out of: postgresql, metrics, kb-checkrole, config-manager, pg-init-container (init)


cluster yaml 

➜ ~ kubectl get cluster test-cluster -oyaml apiVersion: apps.kubeblocks.io/v1alpha1 kind: Cluster metadata: annotations: cluster.kubeblocks.io/component-class: '{}' creationTimestamp: "2023-04-13T10:33:50Z" finalizers:


**Expected behavior**
patroni postgresql do stop and start succeed.

**Screenshots**

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.
Y-Rookie commented 1 year ago

When the pg cluster is stopped, all pods will be deleted, but the configMap of patroni is not cleaned up, which will cause the cluster to fail to start