This Kubernetes operator automates Cassandra operations such as deploying rack aware clusters, scaling up and down, configuring C* and its JVM, upgrading JVM and C*, backup/restores and many more...
Cannot upgrade of casskop operator from version 2.1.0 to 2.1.16.
What did you do?
I had initial cluster with cassandracluster custom resource managed by casskop operator in version 2.1.0 for cassandra image 3.11.14 and it was working correctly.
After upgrade of casskop operator to version 2.1.16 I've encountered on a problem.
What did you expect to see?
I expect that upgrade works correctly without problems.
What did you see instead? Under which circumstances?
time="2023-05-16T04:48:23Z" level=error msg="Issue when updating CassandraCluster" cluster=cassandra-cluster err="Operation cannot be fulfilled on cassandraclusters.db.orange.com \"cassandra-cluster\": the object has been modified; please apply your changes to the latest version and try again"
2023-05-16T04:48:23Z INFO controller_cassandracluster Reconciling CassandraCluster {"Request.Namespace": "prod-doaks-cassandra", "Request.Name": "cassandra-cluster"}
time="2023-05-16T04:48:23Z" level=info msg="We will request : cassandra-cluster-dc1-rack1-0.cassandra-cluster to catch hostIdMap" cluster=cassandra-cluster err="<nil>"
time="2023-05-16T04:48:23Z" level=info msg="The Operator Waits 20 seconds for the action to start correctly" cluster=cassandra-cluster rack=dc1-rack1
time="2023-05-16T04:48:25Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:26Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:27Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:28Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:29Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:29Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:29Z" level=info msg="Error Waiting for sts change" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
2023-05-16T04:48:29Z INFO controller_cassandracluster Reconciling CassandraCluster {"Request.Namespace": "prod-doaks-cassandra", "Request.Name": "cassandra-cluster"}
time="2023-05-16T04:48:29Z" level=info msg="We will request : cassandra-cluster-dc1-rack1-0.cassandra-cluster to catch hostIdMap" cluster=cassandra-cluster err="<nil>"
time="2023-05-16T04:48:29Z" level=info msg="[cassandra-cluster][dc1-rack1]: Update UpdateStatefulSet is Done"
time="2023-05-16T04:48:31Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:32Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:33Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:34Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:35Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:35Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:35Z" level=info msg="Error Waiting for sts change" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
2023-05-16T04:48:35Z INFO controller_cassandracluster Reconciling CassandraCluster {"Request.Namespace": "prod-doaks-cassandra", "Request.Name": "cassandra-cluster"}
time="2023-05-16T04:48:35Z" level=info msg="We will request : cassandra-cluster-dc1-rack1-0.cassandra-cluster to catch hostIdMap" cluster=cassandra-cluster err="<nil>"
time="2023-05-16T04:48:35Z" level=info msg="The Operator Waits 20 seconds for the action to start correctly" cluster=cassandra-cluster rack=dc1-rack1
time="2023-05-16T04:48:37Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:38Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:39Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:40Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:41Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:41Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:41Z" level=info msg="Error Waiting for sts change" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
2023-05-16T04:48:41Z INFO controller_cassandracluster Reconciling CassandraCluster {"Request.Namespace": "prod-doaks-cassandra", "Request.Name": "cassandra-cluster"}
time="2023-05-16T04:48:41Z" level=info msg="We will request : cassandra-cluster-dc1-rack1-0.cassandra-cluster to catch hostIdMap" cluster=cassandra-cluster err="<nil>"
time="2023-05-16T04:48:41Z" level=info msg="The Operator Waits 20 seconds for the action to start correctly" cluster=cassandra-cluster rack=dc1-rack1
time="2023-05-16T04:48:43Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:44Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:45Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:46Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:47Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:47Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:47Z" level=info msg="Error Waiting for sts change" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:47Z" level=error msg="Issue when updating CassandraCluster" cluster=cassandra-cluster err="Operation cannot be fulfilled on cassandraclusters.db.orange.com \"cassandra-cluster\": the object has been modified; please apply your changes to the latest version and try again"
2023-05-16T04:48:47Z INFO controller_cassandracluster Reconciling CassandraCluster {"Request.Namespace": "prod-doaks-cassandra", "Request.Name": "cassandra-cluster"}
time="2023-05-16T04:48:47Z" level=info msg="We will request : cassandra-cluster-dc1-rack1-0.cassandra-cluster to catch hostIdMap" cluster=cassandra-cluster err="<nil>"
time="2023-05-16T04:48:47Z" level=info msg="The Operator Waits 20 seconds for the action to start correctly" cluster=cassandra-cluster rack=dc1-rack1
time="2023-05-16T04:48:49Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
time="2023-05-16T04:48:50Z" level=info msg="Waiting for new version of statefulset" cluster=cassandra-cluster statefulset=cassandra-cluster-dc1-rack1
cassandra.log
WARN [cassandra-exporter-harvester-defer-0] 2023-05-16 04:47:35,734 Harvester.java:188 - Failed to register collector for MBean org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessagePendingTasks
java.lang.IllegalStateException: Object NamedObject{name=org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessagePendingTasks, object=org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge@5fe6da57} and NamedObject{name=org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessagePendingTasks, object=org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge@3db033b} cannot be merged, yet their labels are the same.
at com.zegelin.cassandra.exporter.collector.dynamic.FunctionalMetricFamilyCollector.lambda$merge$0(FunctionalMetricFamilyCollector.java:73)
at java.util.HashMap.merge(HashMap.java:1255)
at com.zegelin.cassandra.exporter.collector.dynamic.FunctionalMetricFamilyCollector.merge(FunctionalMetricFamilyCollector.java:73)
at java.util.HashMap.merge(HashMap.java:1255)
at java.util.Collections$SynchronizedMap.merge(Collections.java:2689)
at com.zegelin.cassandra.exporter.Harvester.lambda$registerMBean$0(Harvester.java:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
WARN [cassandra-exporter-harvester-defer-0] 2023-05-16 04:47:35,735 Harvester.java:188 - Failed to register collector for MBean org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessageCompletedTasks
java.lang.IllegalStateException: Object NamedObject{name=org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessageCompletedTasks, object=org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge@6e911e2c} and NamedObject{name=org.apache.cassandra.metrics:type=Connection,scope=10.244.8.24,name=LargeMessageCompletedTasks, object=org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge@5ae16c89} cannot be merged, yet their labels are the same.
at com.zegelin.cassandra.exporter.collector.dynamic.FunctionalMetricFamilyCollector.lambda$merge$0(FunctionalMetricFamilyCollector.java:73)
at java.util.HashMap.merge(HashMap.java:1255)
at com.zegelin.cassandra.exporter.collector.dynamic.FunctionalMetricFamilyCollector.merge(FunctionalMetricFamilyCollector.java:73)
at java.util.HashMap.merge(HashMap.java:1255)
at java.util.Collections$SynchronizedMap.merge(Collections.java:2689)
at com.zegelin.cassandra.exporter.Harvester.lambda$registerMBean$0(Harvester.java:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
All pods of cassandra are running and ready from kubernetes point of view, but only first statefulset was updated and operator stucked on it:
❯ kubectl -n prod-doaks-cassandra get pods
NAME READY STATUS RESTARTS AGE
cassandra-cassandra-operator-6d7cdd849-rvrbc 1/1 Running 0 19h
cassandra-cluster-dc1-rack1-0 4/4 Running 0 15m
cassandra-cluster-dc1-rack2-0 4/4 Running 0 19h
cassandra-cluster-dc1-rack3-0 4/4 Running 0 19h
Events for first statefulset do not show any errors:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulDelete 17m (x2 over 25m) statefulset-controller delete Pod cassandra-cluster-dc1-rack1-0 in StatefulSet cassandra-cluster-dc1-rack1 successful
Normal SuccessfulCreate 16m (x3 over 17h) statefulset-controller create Pod cassandra-cluster-dc1-rack1-0 in StatefulSet cassandra-cluster-dc1-rack1 successful
In cassandracluster custom resource we can see in status field that rack is not ready yet:
Bug Report
Cannot upgrade of casskop operator from version 2.1.0 to 2.1.16.
What did you do? I had initial cluster with
cassandracluster
custom resource managed bycasskop
operator in version2.1.0
for cassandra image3.11.14
and it was working correctly.After upgrade of
casskop
operator to version2.1.16
I've encountered on a problem.What did you expect to see? I expect that upgrade works correctly without problems.
What did you see instead? Under which circumstances?
All pods of cassandra are running and ready from kubernetes point of view, but only first statefulset was updated and operator stucked on it:
Events for first statefulset do not show any errors:
In cassandracluster custom resource we can see in status field that rack is not ready yet:
HINT: It looks like you already had similar issue: https://github.com/cscetbon/casskop/issues/83
Environment
casskop version:
v2.1.16
Kubernetes version information:
AKS v1.24.10
Cassandra version: Current version of cassandra:
3.11.14
. I was trying with cassandra3.11.10
and result was the same.