bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
9.02k stars 9.22k forks source link

[bitnami/postgresql-ha] repmgr passfile error #16900

Open FawenYo opened 1 year ago

FawenYo commented 1 year ago

Name and Version

bitnami/postgresql-ha 9.0.13

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Create pgass with the content *:*:*:repmgr:{REPMGR_PASSWORD} in Kubernetes secret
  2. Install the Helm Chart with values
  3. Postgres pod just keeps restarting with the error log

Are you using any custom parameters or values?

postgresql:
  extraVolumes:
    - name: repmgr-passfile
      secret:
        secretName: "repgmr-password"
        items:
          - key: ".pgpass"
            path: ".pgpass"
  extraVolumeMounts:
    - name: "repmgr-passfile"
      mountPath: "/opt/bitnami/repmgr/secrets"
  repmgrUsePassfile: true
  repmgrPassfilePath: "/opt/bitnami/repmgr/secrets/.pgpass"

What do you see instead?

[2023-05-25 01:44:34] [NOTICE] repmgrd (repmgrd 5.3.2) starting up
WARNING: password file "/opt/bitnami/repmgr/secrets/.pgpass" has group or world access; permissions should be u=rw (0600) or less
[2023-05-25 01:44:34] [ERROR] connection to database failed
[2023-05-25 01:44:34] [DETAIL] 
fe_sendauth: no password supplied

[2023-05-25 01:44:34] [DETAIL] attempted to connect using:
  user=repmgr passfile=/opt/bitnami/repmgr/secrets/.pgpass connect_timeout=5 dbname=repmgr host={HOST} port=5432 fallback_application_name=repmgr options=-csearch_path=

Additional information

Although the pod log shows no password supplied, but when I entered the container and cat /opt/bitnami/repmgr/secrets/.pgpass file, I did see the content of the file and everything see okay. Also, NOTES.txt in the helm template also requires setting postgresql.repmgrPassword values with the code

{{- $requiredRepmgrPassword := dict "valueKey" "postgresql.repmgrPassword" "secret" $secretName "field" "repmgr-password" "context" $ -}}
{{- $requiredPasswords = append $requiredPasswords $requiredRepmgrPassword -}}

But I think if we set the connection info via passfile, we don't really need this.

Mauraza commented 1 year ago

Hi @FawenYo,

Did you try to resolve this warning WARNING: password file "/opt/bitnami/repmgr/secrets/.pgpass" has group or world access; permissions should be u=rw (0600) or less?
If you resolved it (warning), If you have found it, is the error the same?

FawenYo commented 1 year ago

Hi @FawenYo,

Did you try to resolve this warning WARNING: password file "/opt/bitnami/repmgr/secrets/.pgpass" has group or world access; permissions should be u=rw (0600) or less? If you resolved it (warning), If you have found it, is the error the same?

Hi @Mauraza, I had tested to fix the file permission warning, but the postgres pod still cannot connect and restart error is still the same. Besides, it seems that defaultMode in volume.secret has some errors with securityContext.fsGroup, so I have to remove fsGroup first and then set volume defaultMode.

Mauraza commented 1 year ago

Hi @FawenYo,

Could you share the values are you using to deploy postgresql-ha? I don't understand what are you using extra-volume when this exists: https://github.com/bitnami/charts/blob/66cc6d9e9f2daf8cb3c8cd7dbd420ed0ffe2f907/bitnami/postgresql-ha/templates/postgresql/statefulset.yaml#L334-L338

FawenYo commented 1 year ago

Hi @Mauraza , you can reproduce the error by

  1. kubectl create secret generic repgmr-password --from-literal=.pgpass="*:*:*:repmgr:{PASSWORD}"
  2. Set the Helm values with my previously provided, just copy and paste it here
postgresql:
  password: {PASSWORD}
  extraVolumes:
    - name: repmgr-passfile
      secret:
        secretName: "repgmr-password"
        items:
          - key: ".pgpass"
            path: ".pgpass"
        defaultMode: 384
  extraVolumeMounts:
    - name: "repmgr-passfile"
      mountPath: "/opt/bitnami/repmgr/secrets"
  repmgrUsePassfile: true
  repmgrPassfilePath: "/opt/bitnami/repmgr/secrets/.pgpass"

pgpool:
  adminPassword: {PASSWORD}

The reason to use extraVolumeMounts is that I need to first mount the Kubernetes secret via extraVolumes then mount it with extraVolumeMounts then set repmgrPassfilePath, or if you have successfully set repmgr password with file on your environment, please feel free to provide your values, thanks.

Mauraza commented 1 year ago

Hi @FawenYo,

Could you change the location of the file? I think you are overwritten the REPMGR_PASSWORD_FILE

FawenYo commented 1 year ago

Hi @Mauraza

I tried the values with

postgresql:
  extraVolumeMounts:
    - name: "repmgr-passfile"
      mountPath: "/opt/bitnami/repmgr/conf"
  repmgrPassfilePath: "/opt/bitnami/repmgr/conf/.pgpass"

(Here I only show the modified values from my previous provided values.yaml) But it would still pop up the error log in the pod

postgresql-repmgr 01:23:14.12 INFO  ==> Preparing repmgr configuration...
/opt/bitnami/scripts/librepmgr.sh: line 489: /opt/bitnami/repmgr/conf/repmgr.conf.tmp: Read-only file system

And if I add subPath to extraVolumeMounts with subPath: ".pgpass", the pod cannot even start with the error message

Error: failed to start container "postgresql": Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/930abc01-d14e-4d4b-9e06-74f93be0b0e2/volume-subpaths/repmgr-passfile/postgresql/3" to rootfs at "/opt/bitnami/repmgr/conf": mount /var/lib/kubelet/pods/930abc01-d14e-4d4b-9e06-74f93be0b0e2/volume-subpaths/repmgr-passfile/postgresql/3:/opt/bitnami/repmgr/conf (via /proc/self/fd/6), flags: 0x5001: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type

I think you can try to reproduce the error with my previously provided steps https://github.com/bitnami/charts/issues/16900#issuecomment-1569357205 in your environment, that can make us address the bug more efficiently, thanks.

Mauraza commented 1 year ago

Hi @FawenYo

I created a task to try the find a solution. We will update the thread when we have more information.

FawenYo commented 1 year ago

Hi @Mauraza, any updates? Here I also tried with the below values file

postgresql:
  password: {PASSWORD}
  podSecurityContext:
    enabled: false
  extraVolumes:
    - name: repmgr-passfile
      secret:
        secretName: "repgmr-password"
        items:
          - key: ".pgpass"
            path: ".pgpass"
        defaultMode: 384
  extraVolumeMounts:
    - name: "repmgr-passfile"
      mountPath: "/etc/secrets"
  repmgrPassword: {PASSWORD}
  repmgrUsePassfile: true
  repmgrPassfilePath: "/etc/secrets/.pgpass"
  repmgrLogLevel: DEBUG
  pgHbaTrustAll: true

pgpool:
  adminPassword: {PASSWORD}

and I got the following error message

postgresql-repmgr 01:37:37.71 INFO  ==> ** Starting repmgrd **
[2023-06-16 01:37:37] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-06-16 01:37:37] [INFO] connecting to database "user=repmgr passfile=/etc/secrets/.pgpass host=postgres-postgresql-ha-postgresql-0.postgres-postgresql-ha-postgresql-headless.test.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2023-06-16 01:37:37] [DEBUG] connecting to: "user=repmgr passfile=/etc/secrets/.pgpass connect_timeout=5 dbname=repmgr host=postgres-postgresql-ha-postgresql-0.postgres-postgresql-ha-postgresql-headless.test.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
[2023-06-16 01:37:37] [ERROR] repmgr extension not found on this node
[2023-06-16 01:37:37] [DETAIL] repmgr extension is available but not installed in database "repmgr"
[2023-06-16 01:37:37] [HINT] check that this node is part of a repmgr cluster

Although it still has some errors, I think the connection error is now solved right...?

Mauraza commented 1 year ago

Hi @FawenYo

No error seems to appear, is it working as expected? There is a task to investigate this error, when we have more information we will update the thread.

FawenYo commented 1 year ago

Hi @FawenYo

No error seems to appear, is it working as expected? There is a task to investigate this error, when we have more information we will update the thread.

Hi @Mauraza , the log shows

[2023-06-16 01:37:37] [ERROR] repmgr extension not found on this node

so it still has some errors

dgomezleon commented 11 months ago

Hi @FawenYo ,

I was not able to reproduce the issue using your latest values in my cluster:

$ k logs postgresql-ha-postgresql-0
postgresql-repmgr 07:43:17.20 INFO  ==>
postgresql-repmgr 07:43:17.20 INFO  ==> Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 07:43:17.20 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 07:43:17.21 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 07:43:17.21 INFO  ==>
postgresql-repmgr 07:43:17.22 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 07:43:17.24 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 07:43:17.24 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 07:43:17.25 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 07:43:17.28 INFO  ==> There are no nodes with primary role. Assuming the primary role...
postgresql-repmgr 07:43:17.29 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 07:43:17.29 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 07:43:17.39 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 07:43:17.40 INFO  ==> Initializing Repmgr...
postgresql-repmgr 07:43:17.41 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 07:43:17.41 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 07:43:17.41 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 07:43:18.04 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 07:43:18.16 INFO  ==> Changing password of postgres
postgresql-repmgr 07:43:18.18 INFO  ==> Creating replication user repmgr
postgresql-repmgr 07:43:18.19 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 07:43:18.32 INFO  ==> Configuring replication parameters
postgresql-repmgr 07:43:18.34 INFO  ==> Configuring fsync
postgresql-repmgr 07:43:18.35 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 07:43:18.47 INFO  ==> Creating repmgr user: repmgr
postgresql-repmgr 07:43:18.50 INFO  ==> Creating repmgr database: repmgr
postgresql-repmgr 07:43:18.54 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 07:43:18.64 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 07:43:18.77 INFO  ==> Registering Primary...
postgresql-repmgr 07:43:18.82 INFO  ==> Loading custom scripts...
postgresql-repmgr 07:43:18.83 INFO  ==> Configuring synchronous_replication
postgresql-repmgr 07:43:18.84 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down.... done
server stopped
postgresql-repmgr 07:43:18.95 INFO  ==> ** PostgreSQL with Replication Manager setup finished! **

postgresql-repmgr 07:43:18.96 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2023-11-20 07:43:18.985 GMT [290] LOG:  pgaudit extension initialized
2023-11-20 07:43:18.991 GMT [290] LOG:  redirecting log output to logging collector process
2023-11-20 07:43:18.991 GMT [290] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-11-20 07:43:18.991 GMT [290] LOG:  starting PostgreSQL 16.1 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-11-20 07:43:18.991 GMT [290] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-20 07:43:18.991 GMT [290] LOG:  listening on IPv6 address "::", port 5432
2023-11-20 07:43:18.992 GMT [290] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-11-20 07:43:18.996 GMT [294] LOG:  database system was shut down at 2023-11-20 07:43:18 GMT
2023-11-20 07:43:18.999 GMT [290] LOG:  database system is ready to accept connections
 done
server started
postgresql-repmgr 07:43:19.09 INFO  ==> ** Starting repmgrd **
[2023-11-20 07:43:19] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-11-20 07:43:19] [INFO] connecting to database "user=repmgr passfile=/etc/secrets/.pgpass host=postgresql-ha-postgresql-0.postgresql-ha-postgresql-headless.default.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2023-11-20 07:43:19] [DEBUG] connecting to: "user=repmgr passfile=/etc/secrets/.pgpass connect_timeout=5 dbname=repmgr host=postgresql-ha-postgresql-0.postgresql-ha-postgresql-headless.default.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path="
[2023-11-20 07:43:19] [DEBUG] node id is 1000, upstream node id is -1
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-11-20 07:43:19] [NOTICE] starting monitoring of node "postgresql-ha-postgresql-0" (ID: 1000)
[2023-11-20 07:43:19] [INFO] "connection_check_type" set to "ping"
[2023-11-20 07:43:19] [INFO] executing notification command for event "repmgrd_start"
[2023-11-20 07:43:19] [DETAIL] command is:
  /opt/bitnami/repmgr/events/router.sh 1000 repmgrd_start 1 "2023-11-20 07:43:19.107857+00" "monitoring cluster primary \"postgresql-ha-postgresql-0\" (ID: 1000)"
[2023-11-20 07:43:19] [NOTICE] monitoring cluster primary "postgresql-ha-postgresql-0" (ID: 1000)
2023-11-20 07:43:49.533 GMT [292] LOG:  checkpoint starting: immediate force wait
2023-11-20 07:43:49.568 GMT [292] LOG:  checkpoint complete: wrote 52 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.002 s, total=0.035 s; sync files=17, longest=0.002 s, average=0.001 s; distance=16384 kB, estimate=16384 kB; lsn=0/5000060, redo lsn=0/5000028
2023-11-20 07:43:49.965 GMT [292] LOG:  checkpoint starting: immediate force wait
2023-11-20 07:43:50.003 GMT [292] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.001 s, total=0.038 s; sync files=2, longest=0.001 s, average=0.001 s; distance=32768 kB, estimate=32768 kB; lsn=0/7000060, redo lsn=0/7000028
[2023-11-20 07:43:55] [DEBUG] child node: 1002; attached: yes
[2023-11-20 07:43:55] [DEBUG] child node: 1001; attached: yes
[2023-11-20 07:43:55] [NOTICE] new standby "postgresql-ha-postgresql-2" (ID: 1002) has connected
[2023-11-20 07:43:55] [INFO] executing notification command for event "child_node_new_connect"
[2023-11-20 07:43:55] [DETAIL] command is:
  /opt/bitnami/repmgr/events/router.sh 1000 child_node_new_connect 1 "2023-11-20 07:43:55.379453+00" "new standby \"postgresql-ha-postgresql-2\" (ID: 1002) has connected"
[2023-11-20 07:43:55] [NOTICE] new standby "postgresql-ha-postgresql-1" (ID: 1001) has connected
[2023-11-20 07:43:55] [INFO] executing notification command for event "child_node_new_connect"
[2023-11-20 07:43:55] [DETAIL] command is:
  /opt/bitnami/repmgr/events/router.sh 1000 child_node_new_connect 1 "2023-11-20 07:43:55.406327+00" "new standby \"postgresql-ha-postgresql-1\" (ID: 1001) has connected"
[2023-11-20 07:44:01] [DEBUG] child node: 1002; attached: yes
[2023-11-20 07:44:01] [DEBUG] child node: 1001; attached: yes

Also, previous values with pgHbaTrustAll: true also worked for me:

postgresql:
  password: {PASSWORD}
  extraVolumes:
    - name: repmgr-passfile
      secret:
    secretName: "repgmr-password"
    items:
          - key: ".pgpass"
            path: ".pgpass"
    defaultMode: 384
  extraVolumeMounts:
    - name: "repmgr-passfile"
      mountPath: "/opt/bitnami/repmgr/secrets"
  repmgrUsePassfile: true
  repmgrPassfilePath: "/opt/bitnami/repmgr/secrets/.pgpass"
  pgHbaTrustAll: true
pgpool:
  adminPassword: {PASSWORD}

Probably, the error may be related to this previous case.

I hope it helps