apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.2k stars 184 forks source link

[BUG] PG restored cluster is always in Creating status due to Readiness probe failed #8152

Open tianyue86 opened 2 months ago

tianyue86 commented 2 months ago

Describe the bug

Kubernetes: v1.29.7-gke.1274000
KubeBlocks: 0.9.1-beta.25
kbcli: 0.9.1-beta.10

To Reproduce Steps to reproduce the behavior:

  1. Create pg cluster kbcli cluster create postgres-icpzcz --termination-policy=DoNotTerminate --cluster-definition=postgresql --enable-all-logs=false --cluster-version=postgresql-14.8.0 --set cpu=100m,memory=0.5Gi,replicas=2,storage=3Gi --namespace default

  2. Create backup kbcli cluster backup postgres-icpzcz --method wal-g --namespace default

  3. Restore cluster kbcli cluster restore postgres-icpzcz-backup --backup backup-default-postgres-icpzcz-20240914124229 --namespace default

  4. Check cluster status

    tianyue@localhost kbcli % k get cluster -A | grep postgres
    default     postgres-icpzcz          postgresql            postgresql-14.8.0     DoNotTerminate       Running    61m
    default     postgres-icpzcz-backup   postgresql            postgresql-14.8.0     DoNotTerminate       **Creating**   38m
  5. See error

    tianyue@localhost kbcli % k describe cluster postgres-icpzcz-backup
    Name:         postgres-icpzcz-backup
    Namespace:    default
    Labels:       clusterdefinition.kubeblocks.io/name=postgresql
              clusterversion.kubeblocks.io/name=postgresql-14.8.0
    Annotations:  kubeblocks.io/ops-request: [{"name":"postgres-icpzcz-backup","type":"Restore"}]
              kubeblocks.io/reconcile: 2024-09-14T05:19:49.986719383Z
              kubeblocks.io/restore-from-backup:
                {"postgresql":{"connectionPassword":"EHhYeZrgFEC+x5rv7D+WRo9kZNvT2sIqM40QfqndWwQQIx94","doReadyRestoreAfterClusterRunning":"false","name":...
    API Version:  apps.kubeblocks.io/v1alpha1
    Kind:         Cluster
    Metadata:
    Creation Timestamp:  2024-09-14T04:44:04Z
    Finalizers:
    cluster.kubeblocks.io/finalizer
    Generation:        1
    Resource Version:  9186555
    UID:               c84b3271-c513-450b-b3ef-55c9d37dd378
    Spec:
    Affinity:
    Pod Anti Affinity:     Preferred
    Tenancy:               SharedNode
    Cluster Definition Ref:  postgresql
    Cluster Version Ref:     postgresql-14.8.0
    Component Specs:
    Component Def Ref:  postgresql
    Disable Exporter:   true
    Enabled Logs:
      running
    Name:      postgresql
    Replicas:  2
    Resources:
      Limits:
        Cpu:     100m
        Memory:  512Mi
      Requests:
        Cpu:               100m
        Memory:            512Mi
    Service Account Name:  kb-postgres-icpzcz
    Switch Policy:
      Type:  Noop
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  3Gi
    Resources:
    Cpu:     0
    Memory:  0
    Storage:
    Size:              0
    Termination Policy:  DoNotTerminate
    Status:
    Cluster Def Generation:  2
    Components:
    Postgresql:
      Phase:       Creating
      Pods Ready:  false
    Conditions:
    Last Transition Time:  2024-09-14T04:44:04Z
    Message:               The operator has started the provisioning of Cluster: postgres-icpzcz-backup
    Observed Generation:   1
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2024-09-14T04:44:04Z
    Message:               Successfully applied for resources
    Observed Generation:   1
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2024-09-14T04:44:04Z
    Message:               pods are not ready in Components: [postgresql], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2024-09-14T04:44:04Z
    Message:               pods are unavailable in Components: [postgresql], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
    Observed Generation:     1
    Phase:                   Creating
    Events:
    Type     Reason                    Age                   From                  Message
    ----     ------                    ----                  ----                  -------
    Normal   PreCheckSucceed           38m                   cluster-controller    The operator has started the provisioning of Cluster: postgres-icpzcz-backup
    Normal   ApplyResourcesSucceed     38m                   cluster-controller    Successfully applied for resources
    Normal   NeedWaiting               38m (x6 over 38m)     component-controller  waiting for restore "postgres-icpzcz-backup-postgresql-c84b3271-preparedata" successfully
    Normal   ComponentPhaseTransition  38m                   cluster-controller    component is Creating
    Warning  Unhealthy                 2m48s (x14 over 37m)  event-controller      Pod postgres-icpzcz-backup-postgresql-0: **Readiness probe failed**: 127.0.0.1:5432 - no response
    Warning  Unhealthy                 2m41s (x12 over 36m)  event-controller      Pod postgres-icpzcz-backup-postgresql-1: **Readiness probe failed**: 127.0.0.1:5432 - no response

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

shanshanying commented 2 months ago

hi @tianyue86

to backup and restore a PG cluster using wal-g you should

  1. config wal-g

    kbcli cluster backup <clusterName> --method config-wal-g
  2. update parameters using ops

    apiVersion: apps.kubeblocks.io/v1alpha1
    kind: OpsRequest
    metadata:
    generateName: pg-cluster-reconfiguring-
    spec:
    clusterRef: <clusterName>
    reconfigure:
    componentName: postgresql
    configurations:
      - keys:
          - key: postgresql.conf
            parameters:
              - key: archive_command
                value: "'envdir /home/postgres/pgdata/wal-g/env /home/postgres/pgdata/wal-g/wal-g wal-push %p'"
        name: postgresql-configuration
    type: Reconfiguring
  3. backup cluster

    kbcli cluster backup <clusterName> --method wal-g
  4. restore

    kbcli cluster restore <clusterName> --backup <backupName>
shanshanying commented 2 months ago
image