bitnami / containers

Bitnami container images
https://bitnami.com
Other
3.03k stars 4.38k forks source link

[bitnami/postgresql] Custom unix_socket_directories causes startup configuration to fail #68027

Closed james-mchugh closed 19 hours ago

james-mchugh commented 1 week ago

Name and Version

bitnami/postgresql:16.3.0

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. In any environment.
  2. Use the Bitnami Postgresql Helm chart
  3. Update Postgresql configuration to change where unix sockets are created:
primary:
  configuration: |-
    unix_socket_directories = "/tmp/foo"
    listen_addresses = "0.0.0.0"
  1. Deploy with the custom configuration.

What is the expected behavior?

Postgresql starts up and as expected and uses the custom socket directory of "/tmp/foo" to store Unix sockets.

What do you see instead?

Postgresql starts up and creates a socket in the custom directory, but the startup Bitnami scripts never detect that Postgresql is ready. This causes the container to eventually restart without finishing the startup configuration (such as setting up accounts and passwords). After restarting, it never re-runs the startup configuration, so the accounts and passwords for Postgresql are never properly initialized.

Additional information

This appears to happen in postgresql_start_bg at https://github.com/bitnami/containers/blob/main/bitnami/postgresql/16/debian-12/rootfs/opt/bitnami/scripts/libpostgresql.sh#L778. At line 797, It runs pg_isready, but does not pass in any arguments to specify the hostname which now differs from the default hostname/unix socket. This is an asymmetry with how the probes in the Bitnami Postgresql Helm Chart works, which pass in -h 127.0.0.1 to specify the hostname. This creates a situation where the pod reports it is healthy and ready, but ultimately restarts because the postgresql_start_bg never detects postgresql is running and forces the script to exit.

When exiting, the PID file for postgresql is never cleaned up, so the next time the entrypoint runs and postgresql_start_bg runs again, it detects that Postgres is already running and never attempts to re-run the startup configuration. This puts Postgres in a state where it is running happily, but has never actually finalized its startup configuration so accounts and databases have not been initialized.

As for why a user may want to change the unix_socket_directories, Iron Bank provides a hardened Bitnami Postgresql image built on top of the RHEL UBI. The RHEL Postgresql port defaults to creating sockets under "/run", to avoid the possibility of a Postgresql instance being mimicked when running on bare metal with unix sockets stored under "/tmp". This poses a problem for secured containers as read-only root filesystems are a typical security control for secure containers.

This can be worked around by adjusting the container's security context to make the root filesystem read-write, at the expense of security precautions. Instead, it would be useful to be able to update the configuration of Postgresql to store unix sockets under /tmp when running in containers.

It seems that a straightforward solution here might be for the postgresql_start_bg function to pass the host option with the loopback address to the pg_isready function similar to how the Helm chart's probes work. However, there may be some edge cases here that I haven't considered.

carrodher commented 1 week ago

Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

james-mchugh commented 1 week ago

Yep, I can try to put in a fix within the week for it!

james-mchugh commented 4 days ago

The PR is in!

When exiting, the PID file for postgresql is never cleaned up, so the next time the entrypoint runs and postgresql_start_bg runs again, it detects that Postgres is already running and never attempts to re-run the startup configuration. This puts Postgres in a state where it is running happily, but has never actually finalized its startup configuration so accounts and databases have not been initialized.

I dove into this issue a bit more to see if there was an easy fix that could also be included in the PR. My analysis of why the setup wasn't being completed was wrong. The PID file does get cleaned up when the postgresql_initialize() function runs, which is everytime setup runs. The real issue the setup doesn't complete is because it only runs if the $POSTGRESQL_DATA_DIR directory is empty. When setup runs the second time after failing the first, it sees that the data directory is already populated and doesn't attempt to setup users or anything else again. I'm not sure what the right solution is here, so it may be a problem for another day.

carrodher commented 2 days ago

Thank you for opening this issue and submitting the associated Pull Request. Our team will review and provide feedback. Once the PR is merged, the issue will automatically close.

Your contribution is greatly appreciated!