CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.94k stars 593 forks source link

ERROR running post_bootstrap "Failed to bootstrap cluster" #3528

Open marziman opened 1 year ago

marziman commented 1 year ago

Overview

When trying to install postresql by the help of the crunchy operator, the postresql does not come up and fails at the post_bootstrap step. This leads to a restart loop of the postresql pod as the database container is not coming up. We were able to test this on Ubuntu 20.4.4, where this issue does not occur. Maybe the information about this happening in Ubuntu 20.4.5 might be from help, as we had no issues in the older Ubuntu version.

Environment

K8s: v1.24.8 OS Image: Ubuntu 20.04.5 LTS, linux (amd64) Kernel version: 5.4.0-135-generic Container runtime: containerd://1.6.8-k3s1 Kubelet version: v1.24.8+rke2r1

Please provide the following details:

Steps to Reproduce

  1. Install the Crunchy Operator as described here: https://access.crunchydata.com/documentation/postgres-operator/5.3.0/installation/helm/
  2. Install the Postresql instances by using the example: https://github.com/CrunchyData/postgres-operator-examples/blob/main/helm/postgres/values.yaml
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: my-postgresql
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-15.1-0
  postgresVersion: 15
  instances:
    - replicas: 1
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 8Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: keycloakdb
                  postgres-operator.crunchydata.com/instance-set: "00"
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 8Gi

EXPECTED

Running Postresql database in the k8s cluster, which was deployed by the help of the crunchy operator.

ACTUAL

2023-01-12 20:08:01,395 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-01-12 20:08:01,398 INFO: Lock owner: None; I am toggid-postgresql-00-x6jq-0
2023-01-12 20:08:01,476 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /pgdata/pg15 ... ok
creating directory /pgdata/pg15_wal ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
2023-01-12 20:08:02,296: 667 INFO instana: Instana host agent available. We're in business. Announced PID: 667 (true pid: 3630036)
2023-01-12 20:08:02,296 INFO: Instana host agent available. We're in business. Announced PID: 667 (true pid: 3630036)
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/pgsql-15/bin/pg_ctl -D /pgdata/pg15 -l logfile start

2023-01-12 20:04:45.556 UTC [240] LOG:  pgaudit extension initialized
2023-01-12 20:04:45,563 INFO: postmaster pid=240
/tmp/postgres:5432 - no response
2023-01-12 20:04:45.583 UTC [240] LOG:  redirecting log output to logging collector process
2023-01-12 20:04:45.583 UTC [240] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2023-01-12 20:04:46,633 INFO: establishing a new patroni connection to the postgres cluster
2023-01-12 20:04:46,690 INFO: running post_bootstrap
2023-01-12 20:04:46,690 ERROR: post_bootstrap
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/bootstrap.py", line 335, in post_bootstrap
    self.create_or_update_role(replication['username'], replication.get('password'), ['REPLICATION'])
  File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/bootstrap.py", line 314, in create_or_update_role
    END;$$""".format(quote_literal(name), quote_ident(name, self._postgresql.connection()), ' '.join(options))
  File "/usr/local/lib/python3.6/site-packages/patroni/psycopg.py", line 40, in quote_ident
    return _quote_ident(value, conn)
TypeError: argument 2 must be a connection or a cursor
2023-01-12 20:04:46,696 INFO: removing initialize key after failed attempt to bootstrap the cluster
2023-01-12 20:04:46,758 INFO: renaming WAL directory and updating symlink: /pgdata/pg15_wal
2023-01-12 20:04:46,759 INFO: renaming data directory to /pgdata/pg15_2023-01-12-20-04-46
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 143, in main
    return patroni_main()
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 135, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 100, in abstract_main
    controller.run()
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 105, in run
    super(Patroni, self).run()
  File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 59, in run
    self._run_cycle()
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 108, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1514, in run_cycle
    info = self._run_cycle()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1388, in _run_cycle
    return self.post_bootstrap()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1280, in post_bootstrap
    self.cancel_initialization()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1273, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

Additional Information

Not happening in Ubunut 20.4.4 LTS but happening in Ubuntu 20.4.5 LTS

marziman commented 1 year ago

Hello dear PGO team, its been 2 weeks that we raised this issue.

No one even commenting. Is there any of the maintainers here to help please?

@benjaminjb ?

SomniVertix commented 1 year ago

What did your solution end up being? @marziman