[bitnami/postgresql-ha] Enhance pgpool-II Recovery Mechanism by Automatically Discarding Stale Status on Restart

seatrain commented 5 days ago

Name and Version

bitnami/postgresql-ha/14.2.8

What is the problem this feature will solve?

In high-availability PostgreSQL deployments managed by Bitnami's pgpool-II Helm chart within Kubernetes environments, scenarios arise where multiple PostgreSQL nodes, including the primary node, may experience downtime simultaneously. When these nodes recover, especially with a new primary node taking over, pgpool-II struggles to reattach and recognize the new primary node due to stale status information retained in its pgpool_status file.

Key Issues:

Delayed Recovery: In situations where many nodes go down, including the primary, the time taken for a new primary to be elected may exceed the health check period. This results in pgpool-II being unable to recover to a stable state even after the PostgreSQL cluster becomes operational again.

Health Check Failures: The live probe mechanism triggers the /opt/bitnami/scripts/pgpool/healthcheck.sh script, which attempts to reattach nodes based on the status file and current repmgr pg node status. However, if the primary node has changed and the status file is outdated, pcp_attach_node fails because of no primary node.

Stale Status Retention: Upon pod restarts, pgpool-II reads from the existing pgpool_status file, which may contain outdated information about node statuses. This prevents pgpool-II from recognizing newly promoted primary nodes, leading to failed health checks and continuous pod restarts.

What is the feature you are proposing to solve the problem?

Modify the Startup Script:

Update the /opt/bitnami/scripts/pgpool/run.sh script to include the -D flag in the flags array. This ensures that every time pgpool-II starts, it automatically discards the stale pgpool_status file.

Before Modification:

flags=("-n" "--config-file=${PGPOOL_CONF_FILE}" "--hba-file=${PGPOOL_PGHBA_FILE}") After Modification:

flags=(
  "-n"
  "-D"
  "--config-file=${PGPOOL_CONF_FILE}"
  "--hba-file=${PGPOOL_PGHBA_FILE}"
)

Explanation:

-D Flag: The -D flag instructs pgpool-II to discard the existing pgpool_status file during startup. This action ensures that any stale or outdated node status information is flushed, allowing pgpool-II to perform fresh health checks and accurately recognize the current primary and standby PostgreSQL nodes.

What alternatives have you considered?

Leverage PGPOOL_EXTRA_FLAGS Environment Variable:

Utilize the existing PGPOOL_EXTRA_FLAGS environment variable provided by the pgpool-II Helm chart to pass the -D flag. This approach avoids direct modification of the startup script and leverages Helm's configuration capabilities for greater flexibility.

Update values.yaml in Helm Chart:

pgpool:
  extraEnvVars:
    - name: PGPOOL_EXTRA_FLAGS
      value: "-D"

This configuration ensures that the -D flag is automatically included in the pgpool-II startup command without altering the base run.sh script.

javsalgar commented 2 days ago

Hi!

Thank you so much for the feature request! Would you like to submit a PR updating the startup parameters in the pgpool container?

seatrain commented 2 days ago

Hi!

Thank you so much for the feature request! Would you like to submit a PR updating the startup parameters in the pgpool container?

yes

javsalgar commented 1 day ago

Thank you so much for the PR! The team will take a look

bitnami / charts