Closed trungphungduc closed 3 years ago
Hi @trungphungduc, it is not always required to install with volumePermissions.enabled=false
, it depends on your PV type.
Also, note that the flag is only meant to be used when deploying with a new PVC, existing ones should already have correct permissions so no need to specify it.
If you remove it, do you get any other errors? If not I assume we can consider this solved, right?.
I have pretty much the same issue, whether or not I use volumePermissions. Data is being populated into the PV, using the default security context of 1001. Not sure if this has anything to do with it, but where the log shows "starting monitoring of node....(ID: 1000)" Should it instead be 1001?
Hi @trungphungduc, it is not always required to install with
volumePermissions.enabled=false
, it depends on your PV type.Also, note that the flag is only meant to be used when deploying with a new PVC, existing ones should already have correct permissions so no need to specify it.
If you remove it, do you get any other errors? If not I assume we can consider this solved, right?.
Hi @marcosbc, When i remove, the error (as the picture above) still appear. Here is the command that i use.
helm install stgcc-pgpool-pers bitnami/postgresql-ha \
--set persistence.existingClaim=stgcc-pgpool-pv-claim
Sorry, I meant the other way around: It is not always required to install with volumePermissions.enabled=false
.
According to the issue's description:
The reason that caused the error was "--set volumePermissions.enabled=true"
When i set "--set volumePermissions.enabled=false" => every thing go OK.
When i set "--set volumePermissions.enabled=true" => error appear like the image below.
So if you set --set volumePermissions.enabled=false
it would work right?
@js02sixty It is probably not, it was just a guess of mine. Could you share how are you populating data into the PV? Also, can you confirm the PostgreSQL data is located at /bitnami/postgresql/data
inside the PV?
chart install settings:
helm install pg bitnami/postgresql-ha --version 6.2.0 -f - <<EOF
postgresql:
password: JnT7n4lJDr8WKF4lyLS2VvsJ
repmgrPassword: 0Er7CGvleQjPosJY1iEfkwNm
volumePermissions:
enabled: true
persistence:
existingClaim: pg-data
service:
type: LoadBalancer
loadBalancerIP: 172.24.16.111
EOF
PV Info:
kubectl get pv pvc-d09632ff-8a81-49d9-a5a0-0d0e6d51a1e2
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-d09632ff-8a81-49d9-a5a0-0d0e6d51a1e2 10Gi RWX Delete Bound default/pg-data nfs-fast 24h
nfs-client provisioner based PVC
╭─root@vhost01 /srv/cluster/nfs02
╰─# ls -lha default-pg-data-pvc-d09632ff-8a81-49d9-a5a0-0d0e6d51a1e2
total 4.0K
drwxrwxrwx. 5 1001 1001 42 Dec 8 11:05 .
drwxr-xr-x. 9 root root 4.0K Dec 7 10:53 ..
drwx------. 2 1001 1001 48 Dec 8 11:01 conf
drwx------. 9 1001 root 235 Dec 8 11:06 data
drwxr-xr-x. 2 1001 root 25 Dec 8 11:06 lock
╭─root@vhost01 /srv/cluster/nfs02
╰─# ls -lha default-pg-data-pvc-d09632ff-8a81-49d9-a5a0-0d0e6d51a1e2/data
total 20K
drwx------. 9 1001 root 235 Dec 8 11:06 .
drwxrwxrwx. 5 1001 1001 42 Dec 8 11:05 ..
drwx------. 6 1001 root 54 Dec 8 11:06 base
-rw-------. 1 1001 root 51 Dec 8 11:06 current_logfiles
-rw-------. 1 1001 root 1.6K Dec 8 11:06 pg_ident.conf
drwx------. 4 1001 root 68 Dec 8 11:06 pg_logical
drwx------. 2 1001 root 6 Dec 8 11:06 pg_replslot
drwx------. 2 1001 root 84 Dec 8 11:06 pg_stat
drwx------. 2 1001 root 6 Dec 8 11:06 pg_stat_tmp
drwx------. 2 1001 root 6 Dec 8 11:06 pg_tblspc
-rw-------. 1 1001 root 3 Dec 8 11:06 PG_VERSION
drwx------. 2 1001 root 18 Dec 8 11:06 pg_xact
-rw-------. 1 1001 root 88 Dec 8 11:06 postgresql.auto.conf
-rw-------. 1 1001 root 249 Dec 8 11:06 postmaster.opts
I now know what the problem is. Instead of selecting an existingClaim this time, I chose storageClass for dynamic provisioning. Each pod is now getting their own PV, before they were sharing the one which was causing conflict. I know that chart mariadb-9.0.1 you can select an existingClaim for the first pod and the subsequent replicated pods can be provisioned dynamically with "storageClass".
@marcosbc
As you said:
So if you set --set volumePermissions.enabled=false it would work right?
=> it work OK.
But data didn't save at the PCV (as i use for existingClaim).
My goal is that i can save data for persistancy.
@marcosbc
I just found the hint,
if i add this one: --set persistence.enabled=false
=> everything OK.
if --set persistence.enabled=true
=> Error (like the above image).
so the full command i want, but cause the problem => install ERROR:
helm install stgcc-pgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc-pgpool-pv-claim \
--set volumePermissions.enabled=true
full command here ==> install OK, but can not mount persistance data:
helm install stgcc-pgpool bitnami/postgresql-ha \
--set persistence.enabled=false \
--set persistence.existingClaim=stgcc-pgpool-pv-claim \
--set volumePermissions.enabled=true
@trungphungduc I see, I thought your issues were specific to the volumePermissions flags, not the persistence ones.
In that case, could you share the full initialization logs from the K8s postgresql-repmgr pod? If you could show them with --set postgresqlImage.debug=true
enabled it would be great.
Also, could you check the file permissions inside your PostgreSQL's volume folder /home/bitnami_postgresql/stage/cochat/persistenceKube/pgpool
? Specifically the ones in the bitnami/postgresql
and bitnami/postgresql/data
subdirectories. You can do that with ls -la /path/to/dir
@marcosbc Here are the 3 logs.
First 3 pics, logs pod/stgcc-pgpool-postgresql-ha-postgresql-0
ls -la /home/bitnami_postgresql/stage/cochat/persistenceKube/pgpool
i connect to postgres container by docker exec -it ls -la bitnami/postgresql and ls -la bitnami/postgresql/data
Hi @trungphungduc, I'm a bit confused. You were using stgcc-pgpool-pv-claim
previously which had the data mounted at /home/bitnami_postgresql/stage/cochat/persistenceKube/pgpool
. But then I see that the directory is empty, while the directory mounted to the container (which should be empty) is not.
I think I'm missing something, maybe you were using a different volume?
In any case, from the container's POV I cannot see any misconfigured file permission. Could you check that your PV has enough space for your DB? Maybe 1GB is not enough for your case:
spec:
storageClassName: local
capacity:
storage: 1Gi
@trungphungduc, I think the whole issue would be avoided if you chose not to use an existing claim, because when you do that, all of the replicas use the same PVC, which is no good, it will constantly write over eachother repeatedly until you get a crash loop. Each replica needs its own PVC, and the only way i can see that working is if you specify a storage class like this...
helm install stgcc-pgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.storageClass=<some storage class> \
--set volumePermissions.enabled=true
the chart will deploy with an existing claim if you set postgresql.replicaCount=1
hi @marcosbc, Let me explain my thought step by step to install postgresql-ha: My goal is to install postgresql-ha and keep my data safe at my Real (physical) Server Linux, so: Step 1. I create a pv, pvc to hold the content => it is empty at this time. Step 2. I install postgresql-ha with intention to mount data to pvc (step 1).
helm install stgcc-pgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc-pgpool-pv-claim \
--set volumePermissions.enabled=true
The question is:
hi @js02sixty, Thanks for your suggesion, i will try your method and get back to you soon. I just step inside kuber world a few days ago, so my skill is not good at at. i need to find out what is storage class and how to use ..... So wait me.
@trungphungduc Those steps would be correct yes. You would also probably need to specify --set volumePermissions.enabled=true
(refer to our docs).
However, when you shared the screenshot, that directory was empty. After you deploy the chart (step 2), is it still empty?
If it was indeed empty, the other screenshot you shared (showing the contents of /bitnami/postgresql/data) would not make sense because it should show the contents of the PV (which is empty).
Hi @marcosbc,
/home/bitnami_postgresql/stage/cochat/persistenceKube/pgpool
is empty.
=> This dir was indeed empty, because the installed [ postgresql-ha ] got error, so it can not mount data.Hi @trungphungduc,
I try to connect as fast as possible to take a picture. After a few seconds, that pod will die and continue to create another pod.
To avoid that, you can specify the following options to your deployment:
--set 'postgresql.command[0]=sleep,postgresql.command[1]=infinity,postgresql.readinessProbe.enabled=false,postgresql.livenessProbe.enabled=false'
Then, kubectl exec
into your postgresql-repmgr container(s) and run the following command:
/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh /opt/bitnami/scripts/postgresql-repmgr/run.sh
If should fail with the same logs as before, but now you have access to the container and it does not die after a few seconds, and you can perform the proper diagnostics. I.e.:
$ ls -la /bitnami/postgresql
$ ls -la /bitnami/postgresql/data
If you could run those commands in the deployment with both volumePermissions.enabled=true
and set to false
, I think it would be helpful to compare.
Hi @marcosbc, I am a little busy nowaydays due to covid19, i will take a picture for you ASAP. Please keep this post active. Thanks.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Hi @marcosbc, Sorry for letting you wait, i just comeback. I do as your tutorial. Step 1:
helm install testpgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc16-pgpool-pv-claim \
--set volumePermissions.enabled=true \
--set postgresql.command[0]=sleep \
--set postgresql.command[1]=infinity \
--set postgresql.readinessProbe.enabled=false \
--set postgresql.livenessProbe.enabled=false \
--set postgresqlImage.debug=true \
--set postgresql.repmgrPassword=123456 \
-n stgcochat
/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh /opt/bitnami/scripts/postgresql-repmgr/run.sh
=> Result:
Step 3: checking my server (Centos 7).
=> log pod "postgres master" but no result come out ??? (I don't know why)
=> log pod "pgpool":
Hi @trungphungduc, this time it looks like your data is getting persisted properly.
As for your error, it seems like if you were stopping (i.e. exit the shell) of the PostgreSQL container before the PgPool container is restarted, so it cannot find any running PostgreSQL.
I would advise to let the PostgreSQL/repmgr service running (and not quitting) and give some time for the PgPool container to show the actual error. Note that you should also start the 2nd PostgreSQL node manually in a separate window. Make sure not to quit any of those before PgPool tries to connect to them.
Hi @marcosbc, i am not stopping anything. Now i run the command as:
helm install testpgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc16-pgpool-pv-claim \
--set volumePermissions.enabled=true \
--set postgresqlImage.debug=true \
--set postgresql.repmgrPassword=123456 \
-n stgcochat
Result:
kubectl exec to pod and run script.
Checking my server (Centos 7) Get all: Log postgres-0: **Log postgres-1: ==> i still got error.
After a few minutes. I logs posgres-0: The error is the same as i always get.
Hi @trungphungduc, note that if you deploy with these options:
--set 'postgresql.command[0]=sleep,postgresql.command[1]=infinity
Then "kubectl logs" will not show anything. So the first screenshot you sent showing the error of a service already running on port 5432 would make sense because you'd try to run PostgreSQL on a service already running the service.
Could you re-deploy in a clean environment, with the previous options (postgresql.command/args/readinessProbe/livenessProbe) and running the command? I would also specify those same options for pgpool (pgpool.command/args/...).
Unfortunately you'd need to kubectl exec
into all PostgreSQL nodes to execute the specific run.sh
command:
/opt/bitnami/scripts/pgpool/entrypoint.sh /opt/bitnami/scripts/pgpool/run.sh
/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh /opt/bitnami/scripts/postgresql-repmgr/run.sh
Hi @marcosbc, So i re-deploy:
helm install testpgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc16-pgpool-pv-claim \
--set volumePermissions.enabled=true \
--set postgresql.command[0]=sleep \
--set postgresql.command[1]=infinity \
--set postgresql.readinessProbe.enabled=false \
--set postgresql.livenessProbe.enabled=false \
--set postgresqlImage.debug=true \
--set postgresql.repmgrPassword=123456 \
-n stgcochat
Step 1:
kubectl exec pod/testpgpool-postgresql-ha-postgresql-0 -n stgcochat
Step 2:
kubectl exec pod/testpgpool-postgresql-ha-postgresql-1 -n stgcochat
Step 3:
kubectl exec -it pod/testpgpool-postgresql-ha-pgpool-86854685-t24xn bash -n stgcochat
Step 4: a few mins later, pod pgpool got CrashLoopBackOff => at this time, i can not "kubectl exec" to pod pgpool anymore, coz container not found ...
kubectl get all -n stgcochat
Step 5: logs pgpool
kubectl logs pod/testpgpool-postgresql-ha-pgpool-86854685-t24xn -n stgcochat
Hi, it seems like you didn't specify pgpool.command/args/etc, so when you deploy PgPool it fails and ends up in CrashLoopBackOff. Please specify those and try again.
Another thing is I can see (on step 2) is that the postgresql-1 node contains data already so the initialization fails. However, it seems like the postgresql-0 node does not present this.
Hi @marcosbc, I run the command as below, is it right ?
helm install testpgpool bitnami/postgresql-ha \
--set persistence.enabled=true \
--set persistence.existingClaim=stgcc16-pgpool-pv-claim \
--set volumePermissions.enabled=true \
--set postgresql.command[0]=sleep \
--set postgresql.command[1]=infinity \
--set postgresql.readinessProbe.enabled=false \
--set postgresql.livenessProbe.enabled=false \
--set postgresqlImage.debug=true \
--set postgresql.repmgrPassword=123456 \
--set pgpool.command[0]=sleep \
--set pgpool.command[1]=infinity \
--set pgpool.readinessProbe.enabled=false \
--set pgpool.livenessProbe.enabled=false \
-n stgcochat
If Yes, then i show more information:
kubectl exec to pod postgres/pgpool and run command like:
kubectl exec -it pod/testpgpool-postgresql-ha-postgresql-0 -n stgcochat bash
/opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh /opt/bitnami/scripts/postgresql-repmgr/run.sh
Step 1: postgres-0
Step 2: postgres-1
Step 3: pgpool
Step 4: Checking my server Centos 7:
What should i do next ? Can you show me something more concrete example to run .... Thanks in advance.
Hi @trungphungduc, I actually just noticed something that may explain all your issues (sorry for not noticing earlier).
In Statefulsets, it should be avoided to use existingClaim
because any PVC will be shared by all replicas (in your case, replicas=2). Unfortunately there is no simple fix for this.
That explains in your case why postgresql-1 has data before it is initialized (and even in a new deployment), and it is actually the data from postgresql-0.
Therefore, the only thing I can recommend would be to not rely on existingClaim
unless you don't add any replica (i.e. replicas=1).
Hi @marcosbc, Thank for your suggestion. I will test and get back to you soon
I just ran into the same issue while reading though the options. I suggest to either
If any option is preferred, I'd be happy to contribute a PR for it.
Hi,
Thanks for your input. During our experience, we've seen several special use cases with regards to existingClaim, so in principle I wouldn't force replicas=1. However, I believe that adding a note to the documentation is something that could help several users. We appreciate that you want to contribute with a PR so feel free to open it and we will take a look :D
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
**Which chart: bitnami/postgresql-ha (lastest)
Describe the bug ... could not open relation mapping file "global/pg_filenode.map": No such file or directory ... could not create directory "pg_replslot/repmgr_slot_1001.tmp": No such file or directory
To Reproduce helm install pgpool bitnami/postgresql-ha \ --set volumePermissions.enabled=true \ --set persistence.existingClaim=stgcc-pgpool2-pv-claim \
Expected behavior Working.
Version of Helm and Kubernetes:
Additional context I deploy on CentOS Linux 7 (Core). The reason that caused the error was "--set volumePermissions.enabled=true" When i set "--set volumePermissions.enabled=false" => every thing go OK. When i set "--set volumePermissions.enabled=true" => error appear like the image below.
**stgcc-pgpool2-pv-claim
**
**