Description

Redis was configured to run 3 replicas + 1 master, as this is the default for the upstream Redis chart that the Sentry chart utilizes. Newer versions of the Sentry chart than we're running right now override the default to 1 replica.

Further, by default no resource limits are applied to the Redis instances

As a result, we have 4 Redis instances running with unlimited memory which they will gradually use as much as they can over time. Our cluster is configured to autoscale between 3 and 6 nodes. The 4 redis instances would fight to eat all the memory on the 3 nodes. Eventually GCP's autoscaler would kick in and add a 4th node and move one of the Redis instances there. The fresh Redis instance we start off with low memory usage, and then the autoscaler would remove the 4th node and shove everything back into 3 nodes. The cycle continues

I suspect we don't really need 3 Redis replicas running for our Sentry instance and am going to try setting it to 1 replica per the Sentry helm chart's new default. If we see performance issues with Sentry we can scale it back up. I'm also applying a 2-3gb memory limit on the Redis replicas

Type of change

[X] Bug fix (non-breaking change which fixes an issue)
[ ] New feature
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation

How has this been tested?

Opening this pull request will generate a diff

Post-merge follow-ups

[ ] Observe new state of sentry
[ ] Test if Sentry speed is acceptable

The following changes will be applied to the production Kubernetes cluster upon merge.

BE AWARE this may not reveal changes that have been manually applied to the cluster getting undone—applying manual changes to the cluster should be avoided.

sentry, sentry-sentry-redis-replicas, StatefulSet (apps) has changed:
...
      helm.sh/chart: redis-17.11.3
      app.kubernetes.io/instance: sentry
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/component: replica
  spec:
-   replicas: 3
+   replicas: 1
    selector:
      matchLabels:
        app.kubernetes.io/name: sentry-redis
        app.kubernetes.io/instance: sentry
        app.kubernetes.io/component: replica
...
                command:
                  - sh
                  - -c
                  - /health/ping_readiness_local_and_master.sh 1
            resources:
-             limits: {}
-             requests: {}
+             limits:
+               cpu: 2
+               memory: 3Gi
+             requests:
+               cpu: 2
+               memory: 2Gi
            volumeMounts:
              - name: start-scripts
                mountPath: /opt/bitnami/scripts/start-scripts
              - name: health
                mountPath: /health
...

cal-itp / data-infra

fix(sentry): limit redis to 1 replica and add resource limits #3462

Description

Type of change

How has this been tested?

Post-merge follow-ups