CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.97k stars 595 forks source link

Database crashed #2343

Closed aleksrosz closed 3 years ago

aleksrosz commented 3 years ago

Problem After weekend I have problem with database.

pgo test pgo-dev -n play24-dev

cluster : pgo-dev Services primary (100.100.221.136:5432): DOWN replica (100.100.213.96:5432): UP Instances replica (pgo-dev-5689d47974-g6fbc): UP replica (pgo-dev-qkst-64c9c688d4-qvgdc): UP primary (pgo-dev-ydof-7d79775ff9-lrjgf): DOWN

There is problem with database on primary (pgo-dev-ydof-7d79775ff9-lrjgf): DOWN

oc logs pgo-dev-ydof-7d79775ff9-lrjgf database logs in file in attachment pgodatabaselogs.txt

backrest-backup-pgo-dev-72z97 1/1 Running 0 4s pgo-dev-5689d47974-g6fbc 2/2 Running 0 10d pgo-dev-backrest-shared-repo-5665d995b8-xbvdl 1/1 Running 0 10d pgo-dev-qkst-64c9c688d4-qvgdc 2/2 Running 0 10d pgo-dev-ydof-7d79775ff9-lrjgf 1/2 Running 0 10d

After few seconds backrest-backup-pgo-dev-72z97 0/1 Error 0 9s backrest-backup-pgo-dev-b7mqb 1/1 Running 0 4s pgo-dev-5689d47974-g6fbc 2/2 Running 0 10d pgo-dev-backrest-shared-repo-5665d995b8-xbvdl 1/1 Running 0 10d pgo-dev-qkst-64c9c688d4-qvgdc 2/2 Running 0 10d pgo-dev-ydof-7d79775ff9-lrjgf 1/2 Running 0 10d

After another few seconds

backrest-backup-pgo-dev-9lvqr 0/1 Error 0 16s backrest-backup-pgo-dev-kjrw8 0/1 Error 0 31s backrest-backup-pgo-dev-w85kz 0/1 Error 0 26s pgo-dev-5689d47974-g6fbc 2/2 Running 0 10d pgo-dev-backrest-shared-repo-5665d995b8-xbvdl 1/1 Running 0 10d pgo-dev-qkst-64c9c688d4-qvgdc 2/2 Running 0 10d pgo-dev-ydof-7d79775ff9-lrjgf 1/2 Running 0 10d

oc logs backrest-backup-pgo-dev-9lvqr time="2021-03-22T12:54:09Z" level=info msg="pgo-backrest starts" time="2021-03-22T12:54:09Z" level=info msg="debug flag set to false" time="2021-03-22T12:54:09Z" level=info msg="backrest backup command requested" time="2021-03-22T12:54:09Z" level=info msg="command to execute is [pgbackrest backup --db-host=100.100.28.79 --db-path=/pgdata/pgo-dev-ydof]" time="2021-03-22T12:54:09Z" level=info msg="command is pgbackrest backup --db-host=100.100.28.79 --db-path=/pgdata/pgo-dev-ydof " time="2021-03-22T12:54:09Z" level=error msg="command terminated with exit code 56" time="2021-03-22T12:54:09Z" level=info msg="output=[]" time="2021-03-22T12:54:09Z" level=info msg="stderr=[WARN: option 'repo1-retention-full' is not set for 'repo1-retention-full-type=count', the repository may run out of space\n HINT: to retain full backups indefinitely (without warning), set option 'repo1-retention-full' to the maximum.\nWARN: unable to check pg-1: [DbConnectError] raised from remote-0 protocol on '100.100.28.79': unable to connect to 'dbname='postgres' port=5432 host='/tmp'': could not connect to server: No such file or directory\n \tIs the server running locally and accepting\n \tconnections on Unix domain socket \"/tmp/.s.PGSQL.5432\"?\nERROR: [056]: unable to find primary cluster - cannot proceed\n]" time="2021-03-22T12:54:09Z" level=error msg="command terminated with exit code 56"

There is no problem with available space on storage. Can I get help?

jkatz commented 3 years ago

This sounds like an operational issue, and based on the logs present it does not sound like it's with the Operator. You can read about various support options here.