dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
118 stars 24 forks source link

Statefulset update / recreate #165

Open applike-ss opened 3 months ago

applike-ss commented 3 months ago

Due to config adjustments, the operator tries to patch the sts in a way that would be incompatible.

I am getting this error then:

StatefulSet.apps "drangonfly-app" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

It would be great if drangonfly-operator could delete the sts with cascade=orphan and then re-create it with the current config to ensure the desired state.

I was thinking that when i now remove this sts manually, the operator would re-create it to ensure the desired state. This was also not the case and i would like the operator to ensure the desired state of having the sts with the desired configuration re-created as well.

Abhra303 commented 3 months ago

Due to config adjustments, the operator tries to patch the sts in a way that would be incompatible.

Could you share the config that led to the following behaviour? Updating dragonfly CRD caused the issue? I am interested to know the root cause.

It would be great if drangonfly-operator could delete the sts with cascade=orphan and then re-create it with the current config to ensure the desired state.

Recreating the statefulset wouldn't solve the underlying issue (i.e. why is the operator trying to update statefulset like that).

I was thinking that when i now remove this sts manually, the operator would re-create it to ensure the desired state. This was also not the case and i would like the operator to ensure the desired state of having the sts with the desired configuration re-created as well.

Yep, its indeed nice to have.

SoGooDFR commented 3 months ago

I've the same problem. I deploy the CRD with ArgoCD, but the Operator does not update anything, no trigger for rollout replace. And after that, I can see un the logs :

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235 2024-03-20T16:29:41Z ERROR Reconciler error {"controller": "dragonfly", "controllerGroup": "dragonflydb.io", "controllerKind": "Dragonfly", "Dragonfly": {"name":"dragonfly-test","namespace":"test"}, "namespace": "test", "name": "dragonfly-test", "reconcileID": "4261fe51-f541-437b-9a2e-cf6b64b253db", "error": "StatefulSet.apps \"dragonfly-test\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235

applike-ss commented 3 months ago

Could you share the config that led to the following behaviour? Updating dragonfly CRD caused the issue? I am interested to know the root cause.

It was indeed updating the CR. I was updating the spec.snapshot.persistentVolumeClaimSpec. This leads to an update in the sts' spec.persistentVolumeClaim path, which is usually not allowed. So my suggestion is to allow this change by removing the sts with cascade=false option and re-create it.

Recreating the statefulset wouldn't solve the underlying issue (i.e. why is the operator trying to update statefulset like that).

That is true, updating parts of the CR that are not supposed to be updated from sts side is.

Here's a demo CR:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-app
spec:
  image: ghcr.io/dragonflydb/dragonfly-weekly:e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu
  args:
    - '--cache_mode'
    - '--primary_port_http_enabled=true'
    - '--cluster_mode=emulated'
  snapshot:
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec:
      resources:
        requests:
          storage: 1Gi
      accessModes:
        - ReadWriteOnce
  resources:
    limits:
      cpu: 100m
      memory: 320Mi
    requests:
      cpu: 100m
      memory: 320Mi
  replicas: 3

Updating this to the following will show the issue:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-app
spec:
  image: ghcr.io/dragonflydb/dragonfly-weekly:e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu
  args:
    - '--cache_mode'
    - '--primary_port_http_enabled=true'
    - '--cluster_mode=emulated'
  snapshot:
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec:
      resources:
        requests:
          storage: 2Gi
      accessModes:
        - ReadWriteOnce
  resources:
    limits:
      cpu: 100m
      memory: 320Mi
    requests:
      cpu: 100m
      memory: 320Mi
  replicas: 3