Closed dennisjac closed 3 years ago
I just checked for the possible reason for the eviction since the database is not used at all and contains no data and I found that all three pods (primary and both replicas) each use over 3G of ram. Doing a ps aux
in the primary pod I see this:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 1 0.0 0.0 44300 10948 ? Ss Nov30 0:00 /usr/bin/python3 /usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml
postgres 120 0.0 0.0 112940 2764 ? Ss Nov30 0:00 /usr/sbin/sshd -f /sshd/sshd_config
postgres 130 0.1 0.2 984888 37136 ? Sl Nov30 4:09 /usr/bin/python3 /usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml
postgres 171 0.0 0.1 304420 25052 ? S Nov30 0:03 postgres -D /pgdata/cluster1 --config-file=/pgdata/cluster1/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=cluster1 --wal_level=logical --
postgres 173 0.0 0.0 155716 6984 ? Ss Nov30 0:00 postgres: cluster1: logger
postgres 186 0.0 0.0 304536 12092 ? Ss Nov30 0:33 postgres: cluster1: checkpointer
postgres 187 0.0 0.0 304540 7960 ? Ss Nov30 0:02 postgres: cluster1: background writer
postgres 188 0.0 0.0 157972 7088 ? Ss Nov30 0:28 postgres: cluster1: stats collector
postgres 196 0.0 0.1 306492 17704 ? Ss Nov30 0:53 postgres: cluster1: postgres postgres [local] idle
postgres 201 4.6 19.7 3568508 3247860 ? Ss Nov30 189:09 postgres: cluster1: ccp_monitoring postgres 127.0.0.1(45302) idle
postgres 317 0.0 0.0 305412 11084 ? Ss Nov30 0:13 postgres: cluster1: walsender primaryuser 10.96.0.8(52924) streaming 9/B60018B0
postgres 319 0.0 0.0 305412 11092 ? Ss Nov30 0:12 postgres: cluster1: walsender primaryuser 10.96.2.14(36666) streaming 9/B60018B0
postgres 321 0.0 0.0 304420 10912 ? Ss Nov30 0:07 postgres: cluster1: walwriter
postgres 322 0.0 0.0 305096 9292 ? Ss Nov30 0:02 postgres: cluster1: autovacuum launcher
postgres 323 0.0 0.0 157840 7020 ? Ss Nov30 0:01 postgres: cluster1: archiver last was 0000000200000009000000B5
postgres 324 0.0 0.0 304956 8232 ? Ss Nov30 0:00 postgres: cluster1: logical replication launcher
postgres 1159700 0.1 0.0 15276 3424 pts/0 Ss 09:34 0:00 /bin/bash
postgres 1159766 0.0 0.0 55208 4016 pts/0 R+ 09:34 0:00 ps axu
Apparently the postgres: cluster1: ccp_monitoring postgres 127.0.0.1(45302) idle
process uses 99% of the ram and has very high cpu usage.
The pgo test
listing as you suggest is certainly a bug with the Operator and should be fixed. Thanks for reporting.
If you are running PostgreSQL 11 or above with jit
enabled (verify with SHOW jit
), there is a known issue with JIT compilation in PostgreSQL causing a memory with at least one query in particular, you can read about the details here: https://github.com/CrunchyData/pgmonitor/issues/182#issuecomment-706559384. The mitigation is either to turn JIT off for that query or disabling JIT completely. Information on setting custom configuration can be found here:
https://access.crunchydata.com/documentation/postgres-operator/latest/advanced/custom-configuration/
Describe the bug I setup the operator in a small kubernetes cluster for testing and created a postgres cluster which worked fine. After a while one of the pods got evicted and now "pgo test" shows both the old evicted and new pod as primary with state UP.
To Reproduce
Expected behavior "pgo status" should show the accurate state of the cluster.
Screenshots Output of "pgo test":
Output of "kubectl -n pgo get pods"
Please tell us about your environment: Kubernetes version: v1.16.15-gke.4300 PGO version: 4.5.1 No custom images are used.