apecloud / kubeblocks

KubeBlocks is an open-source control plane software that runs and manages databases, message queues and other stateful applications on K8s.
https://kubeblocks.io
GNU Affero General Public License v3.0
2.06k stars 167 forks source link

[BUG]postgresql pod status is CrashLoopBackOff #1404

Closed ahjing99 closed 1 year ago

ahjing99 commented 1 year ago

KubeBlocks version: 0.3.8 Env: mac+vmfusion+centos+arm

Seems the bitnami/postgresql does not support arm vm, although it can be deployed on mac+arm, we may need to document this limitation

[root@wesql2 ~]# kubectl get pod
NAME                                   READY   STATUS             RESTARTS       AGE
kubeblocks-fb7dd6b7d-2rlcw             1/1     Running            0              125m
kubeblocks-grafana-f5646b574-95n4h     3/3     Running            0              125m
kubeblocks-prometheus-alertmanager-0   2/2     Running            0              125m
kubeblocks-prometheus-server-0         2/2     Running            0              125m
mypg-postgresql-0                      0/2     CrashLoopBackOff   21 (62s ago)   34m

[root@wesql2 ~]# kubectl get cluster
NAME       CLUSTER-DEFINITION    VERSION             TERMINATION-POLICY   STATUS     AGE
mypg       apecloud-postgresql   postgresql-14.6.0   WipeOut              Abnormal   37m

[root@wesql2 ~]# kubectl describe pod mypg-postgresql-0
Name:         mypg-postgresql-0
Namespace:    default
Priority:     0
Node:         wesql3/192.168.1.93
Start Time:   Mon, 13 Feb 2023 02:30:39 -0500
Labels:       app.kubernetes.io/component-name=postgresql
              app.kubernetes.io/instance=mypg
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=state.postgresql-apecloud-postgresql
              controller-revision-hash=mypg-postgresql-677989656c
              statefulset.kubernetes.io/pod-name=mypg-postgresql-0
Annotations:  prometheus.io/path: /metrics
              prometheus.io/port: 9187
              prometheus.io/scheme: http
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.1.5
IPs:
  IP:           10.244.1.5
Controlled By:  StatefulSet/mypg-postgresql
Containers:
  postgresql:
    Container ID:   docker://1997d5e8aea30eee4152315b4aacc9e2343e4058601a4a5082eefedc370c1d44
    Image:          bitnami/postgresql:14.6.0
    Image ID:       docker-pullable://bitnami/postgresql@sha256:8208bf90e6f10ca1445b6a205e90c643e64b017897be33ab4a7b2422d01b2e46
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 13 Feb 2023 03:04:02 -0500
      Finished:     Mon, 13 Feb 2023 03:04:02 -0500
    Ready:          False
    Restart Count:  11
    Limits:
      cpu:     250m
      memory:  2Gi
    Requests:
      cpu:      100m
      memory:   2Gi
    Liveness:   exec [/bin/sh -c exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432] delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/sh -c -ee exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
[ -f /postgresql/tmp/.initialized ] || [ -f /postgresql/.initialized ]
] delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment Variables from:
      mypg-postgresql-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:                          mypg-postgresql-0 (v1:metadata.name)
      KB_NAMESPACE:                         default (v1:metadata.namespace)
      KB_SA_NAME:                            (v1:spec.serviceAccountName)
      KB_NODENAME:                           (v1:spec.nodeName)
      KB_HOSTIP:                             (v1:status.hostIP)
      KB_PODIP:                              (v1:status.podIP)
      KB_PODIPS:                             (v1:status.podIPs)
      KB_CLUSTER_NAME:                      mypg
      KB_COMP_NAME:                         postgresql
      KB_CLUSTER_COMP_NAME:                 mypg-postgresql
      BITNAMI_DEBUG:                        false
      POSTGRESQL_PORT_NUMBER:               5432
      POSTGRESQL_VOLUME_DIR:                /postgresql
      PGDATA:                               /postgresql/data
      POSTGRES_USER:                        <set to the key 'username' in secret 'mypg-conn-credential'>           Optional: false
      POSTGRES_POSTGRES_PASSWORD:           <set to the key 'postgres-password' in secret 'mypg-conn-credential'>  Optional: false
      POSTGRES_PASSWORD:                    <set to the key 'postgres-password' in secret 'mypg-conn-credential'>  Optional: false
      POSTGRES_DB:                          custom_db
      POSTGRESQL_LOG_HOSTNAME:              false
      POSTGRESQL_LOG_CONNECTIONS:           false
      POSTGRESQL_LOG_DISCONNECTIONS:        false
      POSTGRESQL_PGAUDIT_LOG_CATALOG:       off
      POSTGRESQL_CLIENT_MIN_MESSAGES:       error
      POSTGRESQL_SHARED_PRELOAD_LIBRARIES:  pg_stat_statements, auto_explain
    Mounts:
      /dev/shm from dshm (rw)
      /postgresql from data (rw)
      /postgresql/conf from postgresql-config (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lpphp (ro)
  metrics:
    Container ID:  docker://1089747ca7e40b28c48a80b9bc0575d1494996a794751b7ec92e3e6c6e8d9dec
    Image:         bitnami/postgres-exporter:0.11.1-debian-11-r49
    Image ID:      docker-pullable://bitnami/postgres-exporter@sha256:c0687a7095bba017bb4407ab62745b1e2bad12ecee5379d31a0a6d8042912f21
    Port:          9187/TCP
    Host Port:     0/TCP
    Command:
      /opt/bitnami/postgres-exporter/bin/postgres_exporter
      --auto-discover-databases
      --extend.query-path=/opt/conf/custom-metrics.yaml
      --exclude-databases=template0,template1
      --log.level=info
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 13 Feb 2023 03:00:27 -0500
      Finished:     Mon, 13 Feb 2023 03:00:27 -0500
    Ready:          False
    Restart Count:  10
    Liveness:       http-get http://:http-metrics/ delay=5s timeout=5s period=10s #success=1 #failure=6
    Readiness:      http-get http://:http-metrics/ delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment Variables from:
      mypg-postgresql-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:           mypg-postgresql-0 (v1:metadata.name)
      KB_NAMESPACE:          default (v1:metadata.namespace)
      KB_SA_NAME:             (v1:spec.serviceAccountName)
      KB_NODENAME:            (v1:spec.nodeName)
      KB_HOSTIP:              (v1:status.hostIP)
      KB_PODIP:               (v1:status.podIP)
      KB_PODIPS:              (v1:status.podIPs)
      KB_CLUSTER_NAME:       mypg
      KB_COMP_NAME:          postgresql
      KB_CLUSTER_COMP_NAME:  mypg-postgresql
      DATA_SOURCE_URI:       127.0.0.1:5432/postgres?sslmode=disable
      DATA_SOURCE_PASS:      <set to the key 'postgres-password' in secret 'mypg-conn-credential'>  Optional: false
      DATA_SOURCE_USER:      <set to the key 'username' in secret 'mypg-conn-credential'>           Optional: false
    Mounts:
      /opt/conf from postgresql-custom-metrics (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lpphp (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-mypg-postgresql-0
    ReadOnly:   false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  postgresql-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      mypg-postgresql-postgresql-config
    Optional:  false
  postgresql-custom-metrics:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      mypg-postgresql-postgresql-custom-metrics
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      mypg-postgresql-scripts
    Optional:  false
  kube-api-access-lpphp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason               Age                    From               Message
  ----     ------               ----                   ----               -------
  Normal   Scheduled            34m                    default-scheduler  Successfully assigned default/mypg-postgresql-0 to wesql3
  Normal   Pulling              34m                    kubelet            Pulling image "bitnami/postgresql:14.6.0"
  Normal   Pulled               34m                    kubelet            Successfully pulled image "bitnami/postgresql:14.6.0" in 27.793211856s
  Normal   Pulling              34m                    kubelet            Pulling image "bitnami/postgres-exporter:0.11.1-debian-11-r49"
  Warning  Failed               33m                    kubelet            Failed to pull image "bitnami/postgres-exporter:0.11.1-debian-11-r49": rpc error: code = Unknown desc = context canceled
  Warning  Failed               33m                    kubelet            Error: ErrImagePull
  Normal   Started              33m (x2 over 34m)      kubelet            Started container postgresql
  Warning  FailedPostStartHook  33m (x2 over 34m)      kubelet            Exec lifecycle hook ([/scripts/post_start.sh]) for Container "postgresql" in Pod "mypg-postgresql-0_default(d9fa1de1-a706-483b-8301-7e43544b5366)" failed - error: command '/scripts/post_start.sh' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"
  Normal   Killing              33m (x2 over 34m)      kubelet            FailedPostStartHook
  Normal   Created              33m (x2 over 34m)      kubelet            Created container postgresql
  Normal   BackOff              33m (x4 over 33m)      kubelet            Back-off pulling image "bitnami/postgres-exporter:0.11.1-debian-11-r49"
  Warning  Failed               33m (x4 over 33m)      kubelet            Error: ImagePullBackOff
  Normal   Pulled               29m (x5 over 33m)      kubelet            Container image "bitnami/postgresql:14.6.0" already present on machine
  Warning  BackOff              4m40s (x165 over 33m)  kubelet            Back-off restarting failed container

[root@wesql2 ~]# kubectl describe cluster mypg
Name:         mypg
Namespace:    default
Labels:       clusterdefinition.kubeblocks.io/name=apecloud-postgresql
              clusterversion.kubeblocks.io/name=postgresql-14.6.0
Annotations:  kubeblocks.io/storage-class: local-storage
API Version:  dbaas.kubeblocks.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2023-02-13T07:30:38Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:  1
  Managed Fields:
    API Version:  dbaas.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:affinity:
          .:
          f:podAntiAffinity:
          f:topologyKeys:
        f:clusterDefinitionRef:
        f:clusterVersionRef:
        f:components:
          .:
          k:{"name":"postgresql"}:
            .:
            f:monitor:
            f:name:
            f:replicas:
            f:resources:
              .:
              f:limits:
                .:
                f:cpu:
                f:memory:
              f:requests:
                .:
                f:cpu:
                f:memory:
            f:serviceType:
            f:type:
            f:volumeClaimTemplates:
        f:terminationPolicy:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-02-13T07:30:38Z
    API Version:  dbaas.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubeblocks.io/storage-class:
        f:finalizers:
          .:
          v:"cluster.kubeblocks.io/finalizer":
        f:labels:
          .:
          f:clusterdefinition.kubeblocks.io/name:
          f:clusterversion.kubeblocks.io/name:
    Manager:      manager
    Operation:    Update
    Time:         2023-02-13T07:30:38Z
    API Version:  dbaas.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:clusterDefGeneration:
        f:components:
          .:
          f:postgresql:
            .:
            f:message:
              .:
              f:Pod/mypg-postgresql-0:
            f:phase:
            f:podsReady:
            f:type:
        f:conditions:
        f:observedGeneration:
        f:operations:
          .:
          f:horizontalScalable:
          f:restartable:
          f:upgradable:
          f:verticalScalable:
        f:phase:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2023-02-13T07:32:15Z
  Resource Version:  49137
  UID:               d23cbef9-e00c-49bb-8690-fb731bfcc2e0
Spec:
  Affinity:
    Pod Anti Affinity:  Preferred
    Topology Keys:
      kubernetes.io/hostname
  Cluster Definition Ref:  apecloud-postgresql
  Cluster Version Ref:     postgresql-14.6.0
  Components:
    Monitor:   true
    Name:      postgresql
    Replicas:  1
    Resources:
      Limits:
        Cpu:     250m
        Memory:  2Gi
      Requests:
        Cpu:       100m
        Memory:    2Gi
    Service Type:  ClusterIP
    Type:          postgresql
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         2Gi
        Storage Class Name:  local-storage
  Termination Policy:        WipeOut
Status:
  Cluster Def Generation:  2
  Components:
    Postgresql:
      Message:
        Pod/mypg-postgresql-0:  Exec lifecycle hook ([/scripts/post_start.sh]) for Container "postgresql" in Pod "mypg-postgresql-0_default(d9fa1de1-a706-483b-8301-7e43544b5366)" failed - error: command '/scripts/post_start.sh' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"Back-off restarting failed container;
      Phase:                    Failed
      Pods Ready:               false
      Type:                     postgresql
  Conditions:
    Last Transition Time:  2023-02-13T07:30:38Z
    Message:               The operator has started the provisioning of Cluster: mypg
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2023-02-13T07:30:38Z
    Message:               Successfully applied for resources
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2023-02-13T07:32:15Z
    Message:               pods are not ready in Components: [postgresql], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2023-02-13T07:32:15Z
    Message:               pods are unavailable in Components: [postgresql], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
  Observed Generation:     1
  Operations:
    Horizontal Scalable:
      Max:   1
      Name:  postgresql
    Restartable:
      postgresql
    Upgradable:  true
    Vertical Scalable:
      postgresql
  Phase:  Abnormal
Events:
  Type     Reason                 Age                 From                       Message
  ----     ------                 ----                ----                       -------
  Warning  NotFound               36m (x10 over 36m)  system-account-controller  Endpoints "mypg-postgresql" not found
  Normal   Creating               36m                 cluster-controller         Start Creating in Cluster: mypg
  Normal   PreCheckSucceed        36m                 cluster-controller         The operator has started the provisioning of Cluster: mypg
  Normal   ApplyResourcesSucceed  36m                 cluster-controller         Successfully applied for resources
  Warning  FailedPostStartHook    35m                 event-controller           Pod mypg-postgresql-0: Exec lifecycle hook ([/scripts/post_start.sh]) for Container "postgresql" in Pod "mypg-postgresql-0_default(d9fa1de1-a706-483b-8301-7e43544b5366)" failed - error: command '/scripts/post_start.sh' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"
  Warning  BackOff                26m                 event-controller           Pod mypg-postgresql-0: Back-off restarting failed container
linghan-hub commented 1 year ago

mac local minikube environment exist this problem,version 0.3.7 create cluster is ok, version 0.3.8 create cluster status is Abnormal

nashtsai commented 1 year ago

@shanshanying plz investigate following error:

  Warning  NotFound               36m (x10 over 36m)  system-account-controller  Endpoints "mypg-postgresql" not found
shanshanying commented 1 year ago

The warning has been removed in the latest version. It is used as an notification to indicate that the controller failed to find the endpoint for cluster. and it should not happen.

nashtsai commented 1 year ago

The warning has been removed in the latest version. It is used as an notification to indicate that the controller failed to find the endpoint for cluster. and it should not happen.

the latest version as current "main" branch? It's expected that bug fixed gets merged (cherry-picked) to release-0.3 branch.

nashtsai commented 1 year ago

It's been confirmed that provided linux/amd64 PG image doesn't run on Mac Apple silicon using minikube.

nashtsai commented 1 year ago

It's been confirmed that provided linux/amd64 PG image doesn't run on Mac Apple silicon using minikube.

K3d work though.

ahjing99 commented 1 year ago

with KubeBlocks: 0.4.0-alpha.1, the endpoints not found warning does not appear anymore

 Events:
  Type     Reason                 Age                From                Message
  ----     ------                 ----               ----                -------
  Normal   Creating               25m                cluster-controller  Start Creating in Cluster: mypg
  Normal   PreCheckSucceed        25m                cluster-controller  The operator has started the provisioning of Cluster: mypg
  Normal   ApplyResourcesSucceed  25m                cluster-controller  Successfully applied for resources
ahjing99 commented 1 year ago

The warning has been removed in the latest version. It is used as an notification to indicate that the controller failed to find the endpoint for cluster. and it should not happen.

the latest version as current "main" branch? It's expected that bug fixed gets merged (cherry-picked) to release-0.3 branch.

Actually it is not necessary to cherry pick to release-0.3 any more, since 0.3 has stopped iterating