apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.11k stars 172 forks source link

[BUG] postgres cluster is Running but the svc will be unable to connect for a few minutes #5035

Closed JashBook closed 8 months ago

JashBook commented 1 year ago

Describe the bug postgres cluster is Running but the svc will be unable to connect for a few minutes

kbcli version
Kubernetes: v1.26.7-eks-2d98532
KubeBlocks: 0.7.0-alpha.5
kbcli: 0.7.0-alpha.5

To Reproduce Steps to reproduce the behavior:

  1. create cluster
    kbcli cluster create  postgres-test --termination-policy=WipeOut  --cluster-definition=postgresql --cluster-version=postgresql-12.14.0 --set cpu=100m,memory=0.5Gi,replicas=1,storage=4Gi
  2. see error
    
    root@postgres-test-postgresql-0:/home/postgres# psql -Upostgres -hpostgres-test-postgresql -p5432 -w
    psql: error: connection to server at "postgres-test-postgresql" (10.100.247.250), port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
    root@postgres-test-postgresql-0:/home/postgres# psql -Upostgres -hpostgres-test-postgresql -p5432 -w
    psql: error: connection to server at "postgres-test-postgresql" (10.100.247.250), port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
    root@postgres-test-postgresql-0:/home/postgres# psql -Upostgres -hpostgres-test-postgresql -p5432 -w
    psql: error: connection to server at "postgres-test-postgresql" (10.100.247.250), port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
    root@postgres-test-postgresql-0:/home/postgres# psql -Upostgres -hpostgres-test-postgresql -p5432 -w
    psql: error: connection to server at "postgres-test-postgresql" (10.100.247.250), port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?
    root@postgres-test-postgresql-0:/home/postgres# psql -Upostgres -hpostgres-test-postgresql -p5432 -w
    psql (12.14 (Ubuntu 12.14-1.pgdg22.04+1))
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
    Type "help" for help.

postgres=#

kubectl get cluster NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE postgres-test postgresql postgresql-12.14.0 WipeOut Running 9m13s

describe cluster

kubectl describe cluster postgres-test Name: postgres-test Namespace: default Labels: clusterdefinition.kubeblocks.io/name=postgresql clusterversion.kubeblocks.io/name=postgresql-12.14.0 Annotations: kubeblocks.io/reconcile: 2023-09-06T11:08:41.668109108Z API Version: apps.kubeblocks.io/v1alpha1 Kind: Cluster Metadata: Creation Timestamp: 2023-09-06T11:03:28Z Finalizers: cluster.kubeblocks.io/finalizer Generation: 1 Resource Version: 146407515 UID: 5f0926ac-eac9-4f97-9f86-9b96132a3d00 Spec: Affinity: Node Labels: Pod Anti Affinity: Preferred Tenancy: SharedNode Topology Keys: Cluster Definition Ref: postgresql Cluster Version Ref: postgresql-12.14.0 Component Specs: Component Def Ref: postgresql Monitor: false Name: postgresql No Create PDB: false Replicas: 1 Resources: Limits: Cpu: 100m Memory: 512Mi Requests: Cpu: 100m Memory: 512Mi Service Account Name: kb-postgres-test Switch Policy: Type: Noop Volume Claim Templates: Name: data Spec: Access Modes: ReadWriteOnce Resources: Requests: Storage: 4Gi Termination Policy: WipeOut Tolerations: Status: Cluster Def Generation: 4 Components: Postgresql: Phase: Running Pods Ready: true Pods Ready Time: 2023-09-06T11:04:10Z Replication Set Status: Primary: Pod: postgres-test-postgresql-0 Conditions: Last Transition Time: 2023-09-06T11:03:28Z Message: The operator has started the provisioning of Cluster: postgres-test Observed Generation: 1 Reason: PreCheckSucceed Status: True Type: ProvisioningStarted Last Transition Time: 2023-09-06T11:04:08Z Message: Successfully applied for resources Observed Generation: 1 Reason: ApplyResourcesSucceed Status: True Type: ApplyResources Last Transition Time: 2023-09-06T11:04:10Z Message: all pods of components are ready, waiting for the probe detection successful Reason: AllReplicasReady Status: True Type: ReplicasReady Last Transition Time: 2023-09-06T11:04:10Z Message: Cluster: postgres-test is ready, current phase is Running Reason: ClusterReady Status: True Type: Ready Observed Generation: 1 Phase: Running Events: Type Reason Age From Message


Normal ComponentPhaseTransition 9m41s cluster-controller Create a new component Normal PreCheckSucceed 9m41s cluster-controller The operator has started the provisioning of Cluster: postgres-test Warning ApplyResourcesFailed 9m2s cluster-controller Operation cannot be fulfilled on pods "postgres-test-postgresql-0": the object has been modified; please apply your changes to the latest version and try again Normal ApplyResourcesSucceed 9m1s (x2 over 9m41s) cluster-controller Successfully applied for resources Normal ComponentPhaseTransition 9m cluster-controller Running: true, PodsReady: true, PodsTimedout: false Normal AllReplicasReady 9m cluster-controller all pods of components are ready, waiting for the probe detection successful Normal ClusterReady 9m cluster-controller Cluster: postgres-test is ready, current phase is Running Normal Running 9m cluster-controller Cluster: postgres-test is ready, current phase is Running Warning Unhealthy 4m29s (x2 over 4m29s) event-controller Pod postgres-test-postgresql-0: Readiness probe failed: error: health rpc failed: rpc error: code = Unknown desc = {"event":"Success","originalRole":"primary","role":"primary"} Normal SysAcctCreate 4m24s system-account-controller Created accounts for cluster: postgres-test, component: postgresql, accounts: kbmonitoring

describe pod

kubectl describe pod postgres-test-postgresql-0 Name: postgres-test-postgresql-0 Namespace: default Priority: 0 Service Account: kb-postgres-test Node: ip-172-31-26-70.cn-northwest-1.compute.internal/172.31.26.70 Start Time: Wed, 06 Sep 2023 19:03:33 +0800 Labels: app.kubernetes.io/component=postgresql app.kubernetes.io/instance=postgres-test app.kubernetes.io/managed-by=kubeblocks app.kubernetes.io/name=postgresql app.kubernetes.io/version=postgresql-12.14.0 apps.kubeblocks.io/component-name=postgresql apps.kubeblocks.io/workload-type=Replication apps.kubeblocks.postgres.patroni/role=master apps.kubeblocks.postgres.patroni/scope=postgres-test-postgresql-patroni132a3d00 controller-revision-hash=postgres-test-postgresql-75d8594468 kubeblocks.io/role=primary rsm.workloads.kubeblocks.io/access-mode=ReadWrite statefulset.kubernetes.io/pod-name=postgres-test-postgresql-0 Annotations: apps.kubeblocks.io/component-replicas: 1 apps.kubeblocks.io/last-role-changed-event-timestamp: 2023-09-06T11:04:08Z rs.apps.kubeblocks.io/primary: postgres-test-postgresql-0 status: {"conn_url":"postgres://172.31.27.207:5432/postgres","api_url":"http://172.31.27.207:8008/patroni","state":"running","role":"master","vers... Status: Running IP: 172.31.27.207 IPs: IP: 172.31.27.207 Controlled By: StatefulSet/postgres-test-postgresql Init Containers: pg-init-container: Container ID: containerd://30c85ad8645a04c35dea0bbe73222f7daeb5e616791b756885ccfd6d64efec7c Image: registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:12.14.0 Image ID: registry.cn-hangzhou.aliyuncs.com/apecloud/spilo@sha256:5e0b1211207b158ed43c109e5ff1be830e1bf5e7aff1f0dd3c90966804c5a143 Port: Host Port: Command: /kb-scripts/init_container.sh State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 06 Sep 2023 19:03:38 +0800 Finished: Wed, 06 Sep 2023 19:03:38 +0800 Ready: True Restart Count: 0 Limits: cpu: 0 memory: 0 Requests: cpu: 0 memory: 0 Environment Variables from: postgres-test-postgresql-env ConfigMap Optional: false Environment: KB_POD_NAME: postgres-test-postgresql-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_CLUSTER_NAME: postgres-test KB_COMP_NAME: postgresql KB_CLUSTER_COMP_NAME: postgres-test-postgresql KB_CLUSTER_UID_POSTFIX_8: 132a3d00 KB_POD_FQDN: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc Mounts: /home/postgres/conf from postgresql-config (rw) /home/postgres/pgdata from data (rw) /kb-podinfo from pod-info (rw) /kb-scripts from scripts (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n9x56 (ro) role-agent-installer: Container ID: containerd://13b5731500d0099badbcde28cca5a486411ed62c5f9e96ff35a6e495241d3b36 Image: msoap/shell2http:1.16.0 Image ID: docker.io/msoap/shell2http@sha256:a20bdde2f679de2cba6bf3d9f470489c7836d4d0d28232a2b295450809cd43ef Port: Host Port: Command: cp /app/shell2http /role-probe/agent State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 06 Sep 2023 19:03:39 +0800 Finished: Wed, 06 Sep 2023 19:03:39 +0800 Ready: True Restart Count: 0 Environment: Mounts: /role-probe from role-agent (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n9x56 (ro) Containers: postgresql: Container ID: containerd://10e6583cde66886fc049d30b65396522d7e62e066bd2940a83e6a28405a94719 Image: registry.cn-hangzhou.aliyuncs.com/apecloud/spilo:12.14.0 Image ID: registry.cn-hangzhou.aliyuncs.com/apecloud/spilo@sha256:5e0b1211207b158ed43c109e5ff1be830e1bf5e7aff1f0dd3c90966804c5a143 Ports: 5432/TCP, 8008/TCP Host Ports: 0/TCP, 0/TCP Command: /kb-scripts/setup.sh State: Running Started: Wed, 06 Sep 2023 19:03:40 +0800 Ready: True Restart Count: 0 Limits: cpu: 100m memory: 512Mi Requests: cpu: 100m memory: 512Mi Readiness: exec [/bin/sh -c -ee exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432 [ -f /postgresql/tmp/.initialized ] || [ -f /postgresql/.initialized ] ] delay=10s timeout=5s period=30s #success=1 #failure=3 Environment Variables from: postgres-test-postgresql-env ConfigMap Optional: false postgres-test-postgresql-rsm-env ConfigMap Optional: false Environment: KB_POD_NAME: postgres-test-postgresql-0 (v1:metadata.name) KB_POD_UID: (v1:metadata.uid) KB_NAMESPACE: default (v1:metadata.namespace) KB_SA_NAME: (v1:spec.serviceAccountName) KB_NODENAME: (v1:spec.nodeName) KB_HOST_IP: (v1:status.hostIP) KB_POD_IP: (v1:status.podIP) KB_POD_IPS: (v1:status.podIPs) KB_HOSTIP: (v1:status.hostIP) KB_PODIP: (v1:status.podIP) KB_PODIPS: (v1:status.podIPs) KB_CLUSTER_NAME: postgres-test KB_COMP_NAME: postgresql KB_CLUSTER_COMP_NAME: postgres-test-postgresql KB_CLUSTER_UID_POSTFIX_8: 132a3d00 KB_POD_FQDN: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc DCS_ENABLE_KUBERNETES_API: true KUBERNETES_USE_CONFIGMAPS: true SCOPE: $(KB_CLUSTER_NAME)-$(KB_COMP_NAME)-patroni$(KB_CLUSTER_UID_POSTFIX_8) KUBERNETES_SCOPE_LABEL: apps.kubeblocks.postgres.patroni/scope KUBERNETES_ROLE_LABEL: apps.kubeblocks.postgres.patroni/role KUBERNETES_LABELS: {"app.kubernetes.io/instance":"$(KB_CLUSTER_NAME)","apps.kubeblocks.io/component-name":"$(KB_COMP_NAME)"} RESTORE_DATA_DIR: /home/postgres/pgdata/kb_restore KB_PG_CONFIG_PATH: /home/postgres/conf/postgresql.conf SPILO_CONFIGURATION: bootstrap: initdb:

Expected behavior the svc can connect when postgres cluster is Running .

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open for 30 days with no activity

JashBook commented 8 months ago

has been fixed