Open khalifaould opened 2 years ago
There is not enough information here to begin troubleshooting. Information that is helpful for this can be found here: https://github.com/CrunchyData/postgres-operator/issues/new?template=bug_report.md
As part of our resilience tests we have a cluster with 3 master nodes on which we launched a crunchydata cluster with 3 replicas. The list of pods
We simulate the loss of the node on which the pgbackrest (dai-prod-db-repo-host-0) is launched.
We note that this pod is not rescheduled on one of the other available nodes
In this case we can no longer make a backup or restore?
Is it possible to have several replicas of this pod as is the case for the base pods?
Cluster Config
`apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: db-cluster
namespace: namespace
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-13.4-1
postgresVersion: 13
instances:
- name: instance1
replicas: 3
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 3Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: db-cluster
postgres-operator.crunchydata.com/instance-set: instance1
users:
- name: user
databases:
- db_name
options: "SUPERUSER"
databaseInitSQL:
key: create-schema.sql
name: create-schema
backups:
pgbackrest:
global:
repo1-retention-full: "5"
manual:
repoName: repo1
options:
- --type=full
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.35-0
repos:
- name: repo1
schedules:
full: "0 0 * * *"
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 3Gi`
@khalifaould does the repo Pod that cannot be rescheduled get stuck in a "pending" status? If so, can you provide the output of describing the unschedulable repo Pod via kubectl describe
?
The pod is not reprogrammed on one of the other nodes and that's the problem !! Looks like there is an affinity with the node it first launched on
Even when Kubernetes is unable to schedule a Pod, it is still possible to describe/view/etc. the Pod (which should be in a "pending" status). Therefore, please provide the output of describing the repo Pod, e.g. kubectl describe pod dai-prod-db-repo-host-0
.
Additionally, please provide the following details as requested so that we can properly troubleshoot the issue per your specific deployment:
Kubernetes
, OpenShift
, Rancher
, GKE
, EKS
, AKS
etc.)1.20.3
, 4.7.0
)ubi8-5.1.0-0
)14
)hostpath
, nfs
, or the name of your storage class)My thinking here is that the volume being utilized by the repo is in a specific zone where is only available to the node that is being terminated (i.e. the other two nodes are in other zones, and therefore unable to mount that same PVC).
Platform: Rancher K3S Platform Version: v1.19.7+k3s1 PGO Image Tag: ubi8-5.0.3-0 Postgres Version : 13 Storage: Default( local-path)
The dai-prod-db-repo-host-0 pod is not status pending but Terminating. kubectl describe pod dai-prod-db-repo-host-0
Name: dai-prod-db-repo-host-0
Namespace: namespace
Priority: 0
Node: ip-172-16-60-231.eu-west-1.compute.internal/172.16.60.231
Start Time: Thu, 27 Jan 2022 16:32:26 +0000
Labels: controller-revision-hash=dai-prod-db-repo-host-768c6b8559
postgres-operator.crunchydata.com/cluster=dai-prod-db
postgres-operator.crunchydata.com/data=pgbackrest
postgres-operator.crunchydata.com/pgbackrest=
postgres-operator.crunchydata.com/pgbackrest-dedicated=
statefulset.kubernetes.io/pod-name=dai-prod-db-repo-host-0
Annotations:
SizeLimit: 16Mi
default-token-dswpn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dswpn
Optional: false
QoS Class: BestEffort
Node-Selectors:
My issue might be related. We had a (planned) power outage (executed ahead of schedule) which brought down all nodes. For some reason the repo-host-0 pod remained in a Terminating state once everything came back online.
At that point the WAL kept on filling up until the disk ran full...
In the event that the node is lost or the pgbackrest pod is running, the pod is not rescheduled on one of the other available nodes and it is no longer possible to make a backup or restore. Is it normal ? Is it possible to have several replicas of this pod as is the case for the base pods?