Closed jotamartos closed 4 years ago
Hi @HenriqueLBorges
Did you perform any upgrade in your PostgreSQL chart or did it happen after installing the chart without no human intervention at all? Sometimes these errors are related to credentials that change after upgrading the chart.
Hi @juan131
No, I didn't. I it did happen after installing the chart without no human intervention.
That's weird...
Do you any monitoring system in place (e.g. Prometheus+Grafana)? If so, you could try to identify if there's a correlation between these restarts and high CPU/memory usage on the PostgreSQL container. Do you have any logging collector system (e.g. ELK or EFK)? You can also take a look to the logs of the container just before the containers get restarted to identify what's going on (please also share with us so we can take a look to them)
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
Hi @juan131, I'm also encountering a similar problem where postgres pod is going into restart loop.
Chart: postgresql-8.9.4
kubectl describe pod postgresql-pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 30m (x162 over 12h) kubelet, gke-node Container image "docker.io/bitnami/postgresql:11.7.0" already present on machine
Warning Unhealthy 20m (x1661 over 12h) kubelet, gke-node Readiness probe failed: 127.0.0.1:5432 - no response
Warning Unhealthy 11m (x996 over 12h) kubelet, gke-node Liveness probe failed: 127.0.0.1:5432 - no response
Warning BackOff 51s (x1990 over 12h) kubelet, gke-node Back-off restarting failed container
kubectl logs postgresql-pod
postgresql 08:48:37.08
postgresql 08:48:38.60 Welcome to the Bitnami postgresql container
postgresql 08:48:39.47 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 08:48:40.27 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 08:48:41.09
postgresql 08:48:49.91 INFO ==> ** Starting PostgreSQL setup **
postgresql 08:48:59.37 INFO ==> Validating settings in POSTGRESQL_* env vars..
postgresql 08:49:03.28 INFO ==> Loading custom pre-init scripts...
postgresql 08:49:05.17 INFO ==> Loading user's custom files from /docker-entrypoint-preinitdb.d ...
postgresql 08:49:09.58 INFO ==> Initializing PostgreSQL database...
postgresql 08:49:11.38 INFO ==> Cleaning stale /bitnami/postgresql/data/postmaster.pid file
postgresql 08:49:19.69 INFO ==> postgresql.conf file not detected. Generating it...
postgresql 08:49:23.26 INFO ==> pg_hba.conf file not detected. Generating it...
postgresql 08:49:25.18 INFO ==> Generating local authentication configuration
postgresql 08:49:29.68 INFO ==> Deploying PostgreSQL with persisted data...
postgresql 08:49:33.17 INFO ==> Configuring replication parameters
postgresql 08:49:41.68 INFO ==> Configuring fsync
postgresql 08:49:45.37 INFO ==> Loading custom scripts...
postgresql 08:49:48.89 INFO ==> Enabling remote connections
postgresql 08:49:51.56 INFO ==> Stopping PostgreSQL...
postgresql 08:49:53.30 INFO ==> ** PostgreSQL setup finished! **
postgresql 08:50:03.49 INFO ==> ** Starting PostgreSQL **
2020-07-08 08:50:05.215 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-07-08 08:50:05.215 GMT [1] LOG: listening on IPv6 address "::", port 5432
2020-07-08 08:50:05.221 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-07-08 08:50:05.275 GMT [261] LOG: database system was interrupted; last known up at 2020-07-08 08:43:01 GMT
2020-07-08 08:50:05.490 GMT [261] LOG: database system was not properly shut down; automatic recovery in progress
2020-07-08 08:50:05.495 GMT [261] LOG: redo starts at 0/2F101E0
2020-07-08 08:50:05.499 GMT [261] LOG: invalid record length at 0/2F35A30: wanted 24, got 0
2020-07-08 08:50:05.499 GMT [261] LOG: redo done at 0/2F35A08
2020-07-08 08:50:05.499 GMT [261] LOG: last completed transaction was at log time 2020-07-08 08:43:23.928121+00
2020-07-08 08:50:05.551 GMT [1] LOG: database system is ready to accept connections
2020-07-08 08:50:26.015 GMT [293] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [294] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [295] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [289] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [292] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [291] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [275] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.015 GMT [303] FATAL: terminating connection due to unexpected postmaster exit
2020-07-08 08:50:26.017 GMT [290] FATAL: terminating connection due to unexpected postmaster exit
Hello @caalberts
Did you perform any upgrade of your PostgreSQL release recently? A very common issue is to upgrade the chart without indicating the original passwords that were generated the 1st time you installed. When this happens, the password in your secrets (the one used by readiness/liveness probes) is regenerated, and it gets is out of sync with the one available in your PostgreSQL data.
This is documented in the link below:
Hello @juan131
This is not my case, the readiness and liveness probes start to fail usually at times when I'm not heavily using it. Almost feels like is random.
Hi @HenriqueLBorges
Could you share the output of running kubectl describe pod POSTGRESQL_POD
, where _POSTGRESQLPOD is a placeholder for your actual PostgreSQL pod.
I'd like to see if the probes are providing information about the reason why they're failing.
Hi @juan131
Here is the output below:
Name: postgres-postgresql-0
Namespace: prd
Priority: 0
Node: worker-k8s-4/172.20.8.6
Start Time: Thu, 25 Jun 2020 21:29:08 +0000
Labels: app=postgresql
chart=postgresql-8.4.0
controller-revision-hash=postgres-postgresql-7878b964b8
heritage=Tiller
release=postgres
role=master
security.istio.io/tlsMode=istio
statefulset.kubernetes.io/pod-name=postgres-postgresql-0
Annotations: sidecar.istio.io/status:
{"version":"805d5a8f492b8fa20c7d92aac6e0cda9fe0f1fe63e5073b929d39ea721788f25","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.244.4.252
IPs:
IP: 10.244.4.252
Controlled By: StatefulSet/postgres-postgresql
Init Containers:
init-chmod-data:
Container ID: docker://3f13e1e74ae2d74a908c709416dc636cdac67cfb71f631fd3b476c42d1d7f4c5
Image: docker.io/bitnami/minideb:buster
Image ID: docker-pullable://bitnami/minideb@sha256:b3623f09926bd762482eba7c0cf4dd65801fb0a649af3ca4fa2c6ee3f2866da0
Port: <none>
Host Port: <none>
Command:
/bin/sh
-cx
mkdir -p /bitnami/postgresql/data
chmod 700 /bitnami/postgresql/data
find /bitnami/postgresql -mindepth 1 -maxdepth 1 -not -name ".snapshot" -not -name "lost+found" | \
xargs chown -R 1001:1001
chmod -R 777 /dev/shm
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 25 Jun 2020 21:29:11 +0000
Finished: Thu, 25 Jun 2020 21:29:15 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 999m
memory: 8Gi
Environment: <none>
Mounts:
/bitnami/postgresql from data (rw)
/dev/shm from dshm (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-77twx (ro)
istio-init:
Container ID: docker://fc3f84ef0c3a0800d47e04b09d9aea1da79af2a07ffed8b75332120de2622627
Image: docker.io/istio/proxyv2:1.4.5
Image ID: docker-pullable://istio/proxyv2@sha256:fc09ea0f969147a4843a564c5b677fbf3a6f94b56627d00b313b4c30d5fef094
Port: <none>
Host Port: <none>
Command:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 25 Jun 2020 21:29:16 +0000
Finished: Thu, 25 Jun 2020 21:29:16 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-77twx (ro)
Containers:
postgres-postgresql:
Container ID: docker://01c43a4ed85905c6d9599674118be8039406758dfcf7befc9bad4f6b956e4338
Image: docker.io/bitnami/postgresql:11.7.0-debian-10-r0
Image ID: docker-pullable://bitnami/postgresql@sha256:f946f10bdff3b1bc3617536a342da0f0b684623f5409f9c162be72df4155f384
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 11 Jul 2020 02:43:45 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sat, 11 Jul 2020 01:51:26 +0000
Finished: Sat, 11 Jul 2020 02:43:45 +0000
Ready: True
Restart Count: 534
Requests:
cpu: 900m
memory: 8Gi
Liveness: exec [/bin/sh -c exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bin/sh -c -e exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
[ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
] delay=5s timeout=5s period=10s #success=1 #failure=6
Environment:
BITNAMI_DEBUG: false
POSTGRESQL_PORT_NUMBER: 5432
POSTGRESQL_VOLUME_DIR: /bitnami/postgresql
PGDATA: /bitnami/postgresql/data
POSTGRES_USER: postgres
POSTGRES_PASSWORD: <set to the key 'postgresql-password' in secret 'postgres-postgresql'> Optional: false
POSTGRESQL_ENABLE_LDAP: no
Mounts:
/bitnami/postgresql from data (rw)
/dev/shm from dshm (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-77twx (ro)
istio-proxy:
Container ID: docker://a2199bde3d640eacaf1269d8417580e6997cf70b01c760f8e8700e7d0dd30e97
Image: docker.io/istio/proxyv2:1.4.5
Image ID: docker-pullable://istio/proxyv2@sha256:fc09ea0f969147a4843a564c5b677fbf3a6f94b56627d00b313b4c30d5fef094
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
postgresql.$(POD_NAMESPACE)
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15010
--zipkinAddress
zipkin.istio-system:9411
--dnsRefreshRate
300s
--connectTimeout
10s
--proxyAdminPort
15000
--concurrency
2
--controlPlaneAuthPolicy
NONE
--statusPort
15020
--applicationPorts
5432
State: Running
Started: Thu, 25 Jun 2020 21:29:17 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
POD_NAME: postgres-postgresql-0 (v1:metadata.name)
ISTIO_META_POD_PORTS: [
{"name":"tcp-postgresql","containerPort":5432,"protocol":"TCP"}
]
ISTIO_META_CLUSTER_ID: Kubernetes
POD_NAMESPACE: prd (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
ISTIO_META_POD_NAME: postgres-postgresql-0 (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: prd (v1:metadata.namespace)
SDS_ENABLED: false
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_INCLUDE_INBOUND_PORTS: 5432
ISTIO_METAJSON_LABELS: {"app":"postgresql","chart":"postgresql-8.4.0","controller-revision-hash":"postgres-postgresql-7878b964b8","heritage":"Tiller","release":"postgres","role":"master","statefulset.kubernetes.io/pod-name":"postgres-postgresql-0"}
ISTIO_META_WORKLOAD_NAME: postgres-postgresql
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/prd/statefulsets/postgres-postgresql
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-77twx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 10Gi
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pvc0004
ReadOnly: false
default-token-77twx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-77twx
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 30m (x6601 over 15d) kubelet, worker-k8s-4 Liveness probe failed: 127.0.0.1:5432 - no response
Warning Unhealthy 30m (x7835 over 15d) kubelet, worker-k8s-4 Readiness probe failed: 127.0.0.1:5432 - no response
Hi @HenriqueLBorges
There is not much information about the reasons why the readiness probes are failing there. As you can see, these are the probes:
Liveness: exec [/bin/sh -c exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432] delay=30s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [/bin/sh -c -e exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
Could you please manually access your PostgreSQL container and manually run the command below?
pg_isready -U "postgres" -h 127.0.0.1 -p 5432
Hello @juan131
Here is the output:
root@master-k8s:~# kubectl exec -it postgres-postgresql-0 --namespace prd -- /bin/bash
Defaulting container name to postgres-postgresql.
Use 'kubectl describe pod/postgres-postgresql-0 -n prd' to see all of the containers in this pod.
I have no name!@postgres-postgresql-0:/$ pg_isready -U "postgres" -h 127.0.0.1 -p 5432
127.0.0.1:5432 - accepting connections
Hi @HenriqueLBorges
It seems your probes are working (at least at the very moment you tried them), but it's hard to debug since you don't have the logs of the probes when they fail.
The output from the probes is swallowed by the Kubelet component on the node. If a probe fails, its output will be recorded as an event associated with the pod. However, we didn't obtain any relevant information when you run the "kubectl describe" command. It says "127.0.0.1:5432 - no response"
which is not very descriptive.
Maybe you can try editing the probes so you use a different command that provides more information about what's going on (instead of pg_isready
).
Did you perform any upgrade of your PostgreSQL release recently?
Hi @juan131, no it was not an upgrade to an existing PostgreSQL. It's a new PostgreSQL deployment.
Hi @caalberts
The "no response" answer on pg_isready
means the PostgreSQL server is not responding See https://www.postgresql.org/docs/12/app-pg-isready.html
That said, your logs didn't show any warn/error describing the reasons why it's not responding:
2020-07-08 08:50:05.499 GMT [261] LOG: last completed transaction was at log time 2020-07-08 08:43:23.928121+00
2020-07-08 08:50:05.551 GMT [1] LOG: database system is ready to accept connections
You can try increasing the log verbosity by setting log_error_verbosity
in the postgresql.conf
configuration file. To do, install the chart using the postgresqlExtendedConf
parameter. E.g. using the values.yaml below
postgresqlExtendedConf:
log_error_verbosity: verbose
Hi @juan131,
Do you have a different command to recommend in the probes?
Thanks in advance
Hi @HenriqueLBorges
I would try first to increase the log verbosity as I mentioned in my previous comment. That said, you can replace the pg_isready
probe with some query to the database (e.g. listing databases or sth like that).
One important thing you can do is to relax the frequency of the probes. The default "periodSeconds" value for the readiness&livenessprobes is set to 10 seconds. You can relax it to 30 seconds to avoid overloading your PostgreSQL server.
@HenriqueLBorges have you found what the issue was?
I am also interested to
A possibility (although it's not the best alternative) can be using "tcpSocket" to simply ensure Pgpool is listening in the expected port. You can try it using the values below:
pgpool:
livenessProbe:
enabled: false
customLivenessProbe:
tcpSocket:
port: postgresql
readinessProbe:
enabled: false
customReadinessProbe:
tcpSocket:
port: postgresql
@juan131 please elaborate why it's not the best alternative and what are the draw backs?
In some other threads I read, that a "HTML request" may also work (I didn't made any test so far) something like this: (maybe someone can complete that)
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 10
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
Hi @vishrantgupta
IMHO using tcpSocket
is not optimal because it simply ensure there's a process listening on certain port. However, it doesn't check the health of the application nor if the app's ready to accept connections.
@exocode I don't think the http port is exposed in postgres, does it needs any change on postgres pod side?
Using "httpGet" probes on PostgreSQL won't work since it doesn't expose any web endpoint
ok, didn't know that :-)
Well what is the solution ?
Also looking for what the solution was
Guys take a look at the patroni
I would consider VMware SQL with Postgres for Kubernetes https://network.pivotal.io/products/tanzu-sql-postgres/#/releases/1450456/artifact_references
@emoxam what are you exactly proposing with Patroni?
I was able to remedy this issue by downgrading to chart version 14.3.3 from 15.2.2. This was on a fresh install of Postgres.
From @HenriqueLBorges: https://github.com/bitnami/bitnami-docker-postgresql/issues/222
Description
Describe the bug Hello, I have a kubernetes cluster running postgresql. There is no resources limitations, but in a random moment readiness/liveness probes fails and then my container is restarted.
Steps to reproduce the issue:
Describe the results you received:
Additional information you deem important (e.g. issue happens only occasionally):
I ran
pg_isready
inside my container enumerous times and everytime I the following response:I tried to execute big SQL statements and exceed the limit of connections, but I wasn't able to force a container restart. These restarts are all happening when my cluster isn't being heavy used.
Version
docker version
:docker info
:Additional environment details (AWS, VirtualBox, Docker for MAC, physical, etc.):