bitnami / containers

Bitnami container images
https://bitnami.com
Other
3.37k stars 4.84k forks source link

[bitnami/redis] didn't write on aof files and fails when restart any pod #63438

Closed antonios94 closed 6 months ago

antonios94 commented 8 months ago

Name and Version

bitnami/redis:18.17.0

What architecture are you using?

amd64

What steps will reproduce the bug?

helm install redis bitnami/redis -n redis --create-namespace  \
   --set global.redis.password=redis-password,volumePermissions.enabled=true  \
   --set replica.replicaCount=3 \
   --set master.persistence.enabled=true,replica.persistence.enabled=true \
    --set sysctl.enabled=true,sysctl.command[0]='/bin/sh',sysctl.enabled=true,sysctl.command[1]='-c' \
    --set sysctl.command[2]='sysctl -w net.core.somaxconn=10000',sysctl.enabled=true \
    --set sysctl.command[3]='echo never > /host-sys/kernel/mm/transparent_hugepage/enabled' 

What is the expected behavior?

redis create appendonly.aof.1.base.rdb appendonly.aof.1.incr.aof appendonly.aof.manifest on nfs pvc under appendonlydir and write data on these files to can restore when restart

What do you see instead?

all files created with permissions user:root and all are empty and when any redis pod restart it fail it gives error Found an empty AOF manifest

Additional information

k8s version 1.26.14 i tried to deploy with & without volumePermissions.enabled=true and same output and find that it's not affecting any permissions on the created dir and files can't see any errors related to write permissions inside redis pod or in nfs provisioner , when exec to any pod i can read and write in /data dir so i can't find out the issue related to which layer i tried to change uid , gid , fsGroup but with same error and try to change permissions from worker node but still same issue

javsalgar commented 8 months ago

Hi!

Which is the storage provider you are using?

antonios94 commented 8 months ago

Hi!

Which is the storage provider you are using?

iam using [kubernetes-sigs/nfs-subdir-external-provisioner] and tried kubernetes-csi/csi-driver-nfs and same result with both And the storage itself is vsan nfs

antonios94 commented 7 months ago

@FraPazGal @javsalgar any feedback please ??

FraPazGal commented 7 months ago

Hello @antonios94, I see a couple of problems in the sysctl.command you shared. Given we are only using command without the args parameter, to run more than one bash command we should use the following:

...
--set sysctl.enabled=true,sysctl.command[0]='/bin/sh',sysctl.command[1]='-c' \
    --set sysctl.command[2]='sysctl -w net.core.somaxconn=10000 && echo never > /host-sys/kernel/mm/transparent_hugepage/enabled'

After that, there is one other thing we should modify, the sysctl.mountHostSys param. Otherwise, our changes won't reflect on the host /sys path.

Taken the above into account, our testing parameters would be:

$ cat testing_params.yaml
global:
  redis:
    password: redis-password
master:
  persistence:
    enabled: true
replica:
  replicaCount: 3
  persistence:
    enabled: true
volumePermissions:
  enabled: true
sysctl:
  enabled: true
  command: ['sh', '-c', 'sysctl -w net.core.somaxconn=10000 && echo never > /host-sys/kernel/mm/transparent_hugepage/enabled']
  mountHostSys: true

$ helm install redis bitnami/redis -f testing_params.yaml

You can check both sysctl commands affect the main redis container in any of our pods:

$ kubectl exec redis-replicas-0 -- bash
Defaulted container "redis" out of: redis, volume-permissions (init)

I have no name!@redis-replicas-0:/$ sysctl -n net.core.somaxconn && cat sys/kernel/mm/transparent_hugepage/enabled
10000
always madvise [never]

Could you try my suggestions and check if they solve the issue? If the problem persists please share the pod's error logs as well as the specific permissions for the appendonly files.

github-actions[bot] commented 7 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 6 months ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.