bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.81k stars 9.1k forks source link

[bitnami/redis] Issues after upgrading on a k8s cluster having two different redis deployments #27918

Closed Dunge closed 1 month ago

Dunge commented 1 month ago

Name and Version

bitnami/redis 19.6.1

What architecture are you using?

None

What steps will reproduce the bug?

I have a k8s cluster with two helm deployments of bitnami/redis in a separate namespace for a multi-tenant concept. I updated from chart 17.10.3 to 19.6.1 (so Redis version 7.0.11 to 7.2.5). Using the master-replica / sentinel type (3 pods each).

Previously the two sets of instances worked separately with absolutely no issues.

After upgrading, they all merged together as one big Redis service. I saw one master having 5 replicas connected, and sentinel logs switching the master every few seconds. This caused all my data to become corrupted and some secure information to be exposed to the other tenant that shouldn't have been visible. You can see how problematic this episode has been for us.

What could have caused the replicas from one deployment to connect to the master of another following this update? Was there anything changed in the service discovery part?

I finally managed to fix my issue by setting a different name under sentinel.masterSet on both deployments. Unfortunately had to delete my storage completely because the data was filled with invalid entries. Lost a lot of information in the process.

Are you using any custom parameters or values?

sentinel:
  enabled: true
  resourcesPreset: "small"

auth:
  enabled: false
  sentinel: false

master:
  persistence:
    size: {{ .Values.RedisDataSize }}
  resourcesPreset: "medium"
  disableCommands: []

replica:
  persistence:
    size: {{ .Values.RedisDataSize }}
  resourcesPreset: "medium"
  disableCommands: []

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
  resourcesPreset: "small"

useHostnames: false

What is the expected behavior?

Different deployments in different namespaces are to be isolated.

What do you see instead?

Different deployments in different namespace connected to each other

carrodher commented 1 month ago

The issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a specific scenario that is not easy to reproduce on our side.

If you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

Dunge commented 1 month ago

I understand this is probably an auto-generated reply. But no this is not "how the application is being utilized", I mean yeah I understand not many people will install two redis cluster in a single kubernetes cluster, but it's quite easy to reproduce by just deploying your chart twice in two different namespace and is completely irrelevant to my environment.

I would gladly like to find and help fix the situation, or at least provide some validation to prevent this to happen to others and offer a PR. But unfortunately I'm not very knowledgeable in the structure or your project. That's exactly why I opened this ticket. I'm asking for the maintainers of this chart where to look for, what changed recently that would be the cause of this behavior.

github-actions[bot] commented 1 month ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 1 month ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.