[bitnami/postgresql] CrashLoopBackoff when installing postgressql (or postgressql-ha) with a pvc

mrrevillo0815 commented 3 years ago

Which chart: postgresql-10.9.1 | 11.12.0

Describe the bug CrashLoopBackoff when installing postgressql(/postgressql-ha) with a pvc

To Reproduce Steps to reproduce the behavior:

Create PV + PVC
Add PVC as existingClaim in default-config.yaml (with persistence set to true)
Helm install using default-config.yaml

Expected behavior Pods starting without crashing - or at least having a more descriptive error-msg

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"clean", GoVersion:"go1.16.5"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Additional context

Using Ubuntu 20.04
The PV is a local volume on an NTFS-Drive, where i seemingly cannot change the ownership of the Files (it "allows me" to execute commands like chown without reporting errors, but effectively does nothing.

The Logs read

postgresql 15:20:44.42 
postgresql 15:20:44.43 Welcome to the Bitnami postgresql container
postgresql 15:20:44.43 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 15:20:44.43 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 15:20:44.43 
postgresql 15:20:44.44 INFO  ==> ** Starting PostgreSQL setup **
postgresql 15:20:44.46 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 15:20:44.47 INFO  ==> Loading custom pre-init scripts...
postgresql 15:20:44.48 INFO  ==> Initializing PostgreSQL database...
postgresql 15:20:44.50 INFO  ==> pg_hba.conf file not detected. Generating it...
postgresql 15:20:44.50 INFO  ==> Generating local authentication configuration

and end there.

A data folder is created on the mounted device before it crashes
When i tried it with the ha-version, there were two config-files present, and the logs stopped at Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
The error reads CrashLoopBackOff
I tried running it with persistence.enabled=false - it works without problems
I have tried setting volumePermissions.enabled=true, which also did not work

I suspect that there might be problems regarding the fs-ownesrship as ntfs does not seem to support them in a linux-friendly way.

juan131 commented 3 years ago

Hi @mrrevillo0815

Please note you can set the volumePermissions.enabled parameter to true to add a init container that adapts the permissions on the fly, see:

https://github.com/bitnami/charts/tree/master/bitnami/postgresql-ha#volume-permissions-parameters

This is useful for environments in which the K8s version doesn't support SecurityContext or the specific StorageClass is not compatible with it.

On the other hand, please note that using an existing claim (via the persistence.existingClaim parameter) is not compatible with running multiple PostgreSQL backend nodes since every replica will share the same volume. Therefore, you'll be limited to use postgresql.replicaCount=1.

When this option is not set, a different volume and claim is used per replica making use of volumeClaimTemplates and Dynamic volume provisioning, see:

https://github.com/bitnami/charts/blob/master/bitnami/postgresql-ha/templates/postgresql/statefulset.yaml#L571

chauncey-garrett commented 3 years ago

I have this same issue. I tried setting volumePermissions.enabled: true and though the init container will complete, I see no improvement. I have not set persistence.existingClaim.

chauncey-garrett commented 3 years ago

I set persistence.storageClass=gp2 and no longer ran into this issue. The default StorageClass on the cluster was efs-sc.

randradas commented 3 years ago

thank you for your feedback @chauncey-garrett, I'm not pretty sure whether it does not fail using gp2, I will check it.

github-actions[bot] commented 3 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 3 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

LoveHRTF commented 3 years ago

I encountered a similar issue. CrashLoopBackoff only happens when using efs-csi, ebs-cis and gp2 works fine. Also volumePermissions.enabled: true does not address this issue. Not using any persistence.existingClaim in this case.

juan131 commented 3 years ago

Hi @LoveHRTF

It's very likely that those storage classes are not compatible with adapting the filesystem owner/group based on the Security Context settings. That's precisely the reason why we added the volumePermissions.* parameters: providing a workaround to adapt permissions on this kind of volumes.

LoveHRTF commented 3 years ago

Hi @juan131

Thank you for the follow-up. Example Pod provided from EFS CSI driver was working fine, it's just for the potgresql-ha image.

When using this image, PVC and PV were provisioned correctly. postgresql Pods were in CrashLoopBackOff, pgpool Pod also keeps restarting. Does that still look to be a permission issue? adding --set volumePermissions.enabled: true when installing the chart does not help.

Thanks!

juan131 commented 3 years ago

Hi @LoveHRTF

Does the example POD performs write actions on the volume? Do the containers available on this POD run with a non-root user?

fdzuJ commented 2 years ago

Stumbled upon this problem using smb.csi.k8s.io as provisioner for StorageClass.

Problem is invisible in logs unless image.debug=true is set. Using volumePermissions.enabled didn't help - it completed correctly but due to how smb.csi.k8s.io works it did nothing.

Resulting problem was too much permissions. In my case solution is to set mountOptions of sc to one of required modes.

running bootstrap script ... 2022-11-28 14:59:04.351 UTC [90] FATAL:  data directory "/bitnami/postgresql/data" has invalid permissions
[90] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
child process exited with exit code 1
initdb: removing contents of data directory "/bitnami/postgresql/data"

Since it is causing a crash maybe this message should be reflected in non debug log?

bodak commented 11 months ago

Issue sits actually with AWS and the EFS CSI driver implementation. Fix is here: https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/300#issuecomment-1072860587

mehmetaydogduu commented 11 months ago

Hi mayastor users, welcome to: Thread on Mayastor

Tip: Add image.debug=true parameter to your values.yml, than restart the container, if you see Bus error (core dumped). It is related to hugepages. A potential fix is here

A re-opened issue for leaving some notes to who waste time on the bug, sorry.

bitnami / charts

[bitnami/postgresql] CrashLoopBackoff when installing postgressql (or postgressql-ha) with a pvc #7282