cscetbon / casskop

This Kubernetes operator automates Cassandra operations such as deploying rack aware clusters, scaling up and down, configuring C* and its JVM, upgrading JVM and C*, backup/restores and many more...
https://cscetbon.github.io/casskop/
Apache License 2.0
13 stars 8 forks source link

Upgrade of casskop from 2.1.0 to 2.1.16 does not work #108

Closed ajoskowski closed 1 year ago

ajoskowski commented 1 year ago

Bug Report

Cannot upgrade of casskop operator from version 2.1.0 to 2.1.16.

What did you do? I had initial cluster with cassandracluster custom resource managed by casskop operator in version 2.1.0 for cassandra image 3.11.14 and it was working correctly.

After upgrade of casskop operator to version 2.1.16 I've encountered on a problem.

What did you expect to see? I expect that upgrade works correctly without problems.

What did you see instead? Under which circumstances?

All pods of cassandra are running and ready from kubernetes point of view, but only first statefulset was updated and operator stucked on it:

❯ kubectl -n prod-doaks-cassandra get pods
NAME                                           READY   STATUS    RESTARTS   AGE
cassandra-cassandra-operator-6d7cdd849-rvrbc   1/1     Running   0          19h
cassandra-cluster-dc1-rack1-0                  4/4     Running   0          15m
cassandra-cluster-dc1-rack2-0                  4/4     Running   0          19h
cassandra-cluster-dc1-rack3-0                  4/4     Running   0          19h

Events for first statefulset do not show any errors:

Events:
  Type    Reason            Age                From                    Message
  ----    ------            ----               ----                    -------
  Normal  SuccessfulDelete  17m (x2 over 25m)  statefulset-controller  delete Pod cassandra-cluster-dc1-rack1-0 in StatefulSet cassandra-cluster-dc1-rack1 successful
  Normal  SuccessfulCreate  16m (x3 over 17h)  statefulset-controller  create Pod cassandra-cluster-dc1-rack1-0 in StatefulSet cassandra-cluster-dc1-rack1 successful

In cassandracluster custom resource we can see in status field that rack is not ready yet:

status:
  cassandraRackStatus:
    dc1-rack1:
      cassandraLastAction:
        name: UpdateStatefulSet
        startTime: "2023-05-16T05:03:54Z"
        status: Ongoing
      phase: Running
      podLastOperation: {}
    dc1-rack2:
      cassandraLastAction:
        name: UpdateDockerImage
        status: ToDo
      phase: Running
      podLastOperation: {}
    dc1-rack3:
      cassandraLastAction:
        endTime: "2023-05-15T08:34:33Z"
        name: Initializing
        status: Done
      phase: Running
      podLastOperation: {}

HINT: It looks like you already had similar issue: https://github.com/cscetbon/casskop/issues/83

Environment

przysiadZeSztanga commented 1 year ago

I tested fix and it works. @cscetbon could you please merge this and create release version?

tomix86 commented 1 year ago

FYI, we discovered yet another regression and opened https://github.com/cscetbon/casskop/pull/112 to fix it.