BorisPolonsky / dify-helm

Deploy langgenious/dify, an LLM based app on kubernetes with helm chart
MIT License
177 stars 46 forks source link

PostgreSQL DB got broken when uninstall the release #21

Closed gaolitao closed 8 months ago

gaolitao commented 9 months ago

postgresql 11:00:24.78 INFO ==> Starting PostgreSQL 2024-01-04 11:00:24.800 GMT [1] LOG: pgaudit extension initialized 2024-01-04 11:00:24.805 GMT [1] LOG: starting PostgreSQL 15.3 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit 2024-01-04 11:00:24.806 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-01-04 11:00:24.806 GMT [1] LOG: listening on IPv6 address "::", port 5432 2024-01-04 11:00:24.806 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2024-01-04 11:00:24.808 GMT [96] LOG: database system was interrupted; last known up at 2024-01-04 10:16:33 GMT 2024-01-04 11:00:24.817 GMT [96] LOG: database system was not properly shut down; automatic recovery in progress 2024-01-04 11:00:24.818 GMT [96] LOG: redo starts at 0/3349070 2024-01-04 11:00:24.818 GMT [96] LOG: invalid record length at 0/3349158: wanted 24, got 0 2024-01-04 11:00:24.818 GMT [96] LOG: redo done at 0/3349120 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2024-01-04 11:00:24.819 GMT [94] LOG: checkpoint starting: end-of-recovery immediate wait 2024-01-04 11:00:24.820 GMT [94] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB 2024-01-04 11:00:24.822 GMT [1] LOG: database system is ready to accept connections 2024-01-04 11:00:36.250 GMT [107] FATAL: password authentication failed for user "repl_user" 2024-01-04 11:00:36.250 GMT [107] DETAIL: Connection matched pg_hba.conf line 6: "host replication all 0.0.0.0/0 md5" 2024-01-04 11:00:41.255 GMT [108] FATAL: password authentication failed for user "repl_user" 2024-01-04 11:00:41.255 GMT [108] DETAIL: Connection matched pg_hba.conf line 6: "host replication all 0.0.0.0/0 md5"

BorisPolonsky commented 9 months ago

postgresql 11:00:24.78 INFO ==> Starting PostgreSQL 2024-01-04 11:00:24.800 GMT [1] LOG: pgaudit extension initialized 2024-01-04 11:00:24.805 GMT [1] LOG: starting PostgreSQL 15.3 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit 2024-01-04 11:00:24.806 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-01-04 11:00:24.806 GMT [1] LOG: listening on IPv6 address "::", port 5432 2024-01-04 11:00:24.806 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2024-01-04 11:00:24.808 GMT [96] LOG: database system was interrupted; last known up at 2024-01-04 10:16:33 GMT 2024-01-04 11:00:24.817 GMT [96] LOG: database system was not properly shut down; automatic recovery in progress 2024-01-04 11:00:24.818 GMT [96] LOG: redo starts at 0/3349070 2024-01-04 11:00:24.818 GMT [96] LOG: invalid record length at 0/3349158: wanted 24, got 0 2024-01-04 11:00:24.818 GMT [96] LOG: redo done at 0/3349120 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2024-01-04 11:00:24.819 GMT [94] LOG: checkpoint starting: end-of-recovery immediate wait 2024-01-04 11:00:24.820 GMT [94] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB 2024-01-04 11:00:24.822 GMT [1] LOG: database system is ready to accept connections 2024-01-04 11:00:36.250 GMT [107] FATAL: password authentication failed for user "repl_user" 2024-01-04 11:00:36.250 GMT [107] DETAIL: Connection matched pg_hba.conf line 6: "host replication all 0.0.0.0/0 md5" 2024-01-04 11:00:41.255 GMT [108] FATAL: password authentication failed for user "repl_user" 2024-01-04 11:00:41.255 GMT [108] DETAIL: Connection matched pg_hba.conf line 6: "host replication all 0.0.0.0/0 md5"

It looks like that you haven't correctly configured your password given the log, and that an alternate image for arm architecture were deployed.

Please note that if an alternate image other than those released by bitnami, you may want to look into the differences.

If you are running postgresql defined in this release or you are running a separate relase (i.e. through externalPostgres config), please check your authentication defined in postgresql.auth.password and .postgresql.auth.database which are documented here.

If you have alternated password of repl_user after deploying the database within this release, please note take a look at the note from the official repo for postgres from bitanmi

NOTE: Once this chart is deployed, it is not possible to change the application's access credentials, such as usernames or passwords, using Helm. To change these application credentials after deployment, delete any persistent volumes (PVs) used by the chart and re-deploy it, or use the application's built-in administrative tools if available. Warning Setting a password will be ignored on new installation in case when previous PostgreSQL release was deleted through the helm command. In that case, old PVC will have an old password, and setting it through helm won't take effect. Deleting persistent volumes (PVs) will solve the issue. Refer to https://github.com/bitnami/charts/issues/2061 for more details.

Without further information on changes you've made it's not likely that we would come up with further conclusions.

BorisPolonsky commented 8 months ago

Status quo? @gaolitao

gaolitao commented 8 months ago

Hi @BorisPolonsky ,

The issue was met when i uninstall the release, and the DB storage in PVC persisted and was reused, but the service could not be up because of the error pasted in this bug. The password should be correctly configured, but cannot be used, as the log indicated database system was not properly shut down; automatic recovery in progress, my guess is that the DB recovery did not complete successfully for some reason.

And all images are the ones specified in the helm chart values.yaml, the only difference is arm architecture, but this is also provided by bitnami

My workaround is that, remove all PVs and recreate those for the fix.