CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.93k stars 591 forks source link

Error: "/pgdata/pg15" has wrong ownership #3760

Closed SomniVertix closed 11 months ago

SomniVertix commented 1 year ago

Overview

"Instance1" Postgres cluster continuously restarting due to "/pgdata/pg15" has wrong ownership

Environment

Please provide the following details:

Steps to Reproduce

REPRO

Provide steps to get to the error condition:

  1. Run quickstart in postgres operator examples 5.4 (https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart)

EXPECTED

  1. Expected to have postgres pods up and stable

ACTUAL

  1. The database pod keeps failing when trying to start up

Logs

2023-10-23 21:34:54,578 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8c56e24a20>: Failed to establish a new connection: [Errno 111] Connection refused',)': /liveness
2023-10-23 21:34:54,579 WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8c56eba7b8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /liveness
2023-10-23 21:34:54,580 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8c56e24fd0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /liveness
2023-10-23 21:34:54,582 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-10-23 21:34:54,584 INFO: Lock owner: None; I am hippo-my-test-5g8k-0
2023-10-23 21:34:54,695 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /pgdata/pg15 ... ok
creating directory /pgdata/pg15_wal ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... UTC
creating configuration files ... ok
2023-10-23 21:34:55.285 UTC [769] FATAL:  data directory "/pgdata/pg15" has wrong ownership
2023-10-23 21:34:55.285 UTC [769] HINT:  The server must be started by the user that owns the data directory.
child process exited with exit code 1
initdb: removing data directory "/pgdata/pg15"
initdb: removing WAL directory "/pgdata/pg15_wal"
2023-10-23 21:34:55,512 INFO: removing initialize key after failed attempt to bootstrap the cluster
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 191, in main
    return patroni_main(args.configfile)
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 162, in patroni_main
    abstract_main(Patroni, configfile)
  File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 174, in abstract_main
    controller.run()
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 133, in run
    super(Patroni, self).run()
  File "/usr/local/lib/python3.6/site-packages/patroni/daemon.py", line 143, in run
    self._run_cycle()
  File "/usr/local/lib/python3.6/site-packages/patroni/__main__.py", line 136, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1843, in run_cycle
    info = self._run_cycle()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1658, in _run_cycle
    return self.post_bootstrap()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1542, in post_bootstrap
    self.cancel_initialization()
  File "/usr/local/lib/python3.6/site-packages/patroni/ha.py", line 1535, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: Failed to bootstrap cluster
running bootstrap script ... 
andrewlecuyer commented 11 months ago

@SomniVertix per our conversation within Discord in which we determined the issue was with AWS EFS and not PGO, I am going to go ahead and close this issue.

Feel free to reach out if there is anything else you need, or if you continue to have trouble with other types of storage, etc.