Orange-OpenSource / casskop

This Kubernetes operator automates the Cassandra operations such as deploying a new rack aware cluster, adding/removing nodes, configuring the C* and JVM parameters, upgrading JVM and C* versions, and many more...
https://orange-opensource.github.io/casskop/
Apache License 2.0
183 stars 54 forks source link

[BUG] Scaling down the Cassandra cluster fails if some particular events are missed by the operator #357

Closed srteam2020 closed 3 years ago

srteam2020 commented 3 years ago

Bug Report

The casskop operator has the following steps to prevent user from removing more than 1 DC at a time,

  1. dump last CRD status to its annotation when finish one reconcile
  2. get DC size from current CRD status, get old DC size from last CRD status
  3. compare DC size with old DC size
  4. if DC size is less than old DC size - 1, then user is going to remove more than 1 DC at a time. In that case, set DC configuration back to the old one stored in annotation

We ran a workload to change DC size from 3 to 2 to 1. Ideally, the DC will finally be scaled down to 1. However, we observed that the DC is not scaled at the end (still remaining size of 3).

The reason is that the goroutine of performing reconciliation (g1) and the goroutine of updating the cassandraCluster object (g2) are concurrent, and certain interleaving of the two goroutines can lead to the unexpected scaling behavior. Ideally, the following interleaving will lead to the correct result:

  1. g2: set DC size (from 3) to 2
  2. g1: read DC status and scales DC from 2 to 1
  3. g2: set DC size (from 2) to 1
  4. g1: read the oldCRD state (DC size: 2), and scales DC from 2 to 1

And we find that the following interleaving can lead to the unexpected scaling behavior mentioned above:

  1. g2: set DC size (from 3) to 2
  2. g2: set DC size (from 2) to 1
  3. g1: read the oldCRD state (DC size: 3), and DC size (1) is less than old DC size - 1, operator will refuse to scale down, and restore DC back to size of 3

What did you do? We ran a scaledown workload changing DC size from 3 to 2 to 1.

What did you expect to see? The cassandra DC should end up with 1 replica.

What did you see instead? Under which circumstances? The cassandra DC has 3 replicas at the end.

Environment

Additional context We are willing to send a PR to help fix this issue. One potential fix we are considering here is that: In step 4, when the operator finds that "user is going to remove more than 1 DC at a time", instead of setting back to the old value stored in annotation (which will totally block the scaling down), we can set back to the old value - 1. So that even if the operator misses some intermediate events, the scaling down will still happen gracefully (scaling one by one) and smoothly. Please let us know if the potential fix is reasonable or not.

cscetbon commented 3 years ago

You shouldn't be able to trigger 2 operations. You're saying you changed it from 3 to 2 to 1. You should be able to change to 2 but not to 1 until the change has been applied. To me it seems kind of a hack to trigger 2 operations but yeah it should be prevented if it's not.

cscetbon commented 3 years ago

Let me know if the issue still makes sense otherwise we'll close it.

Thanks

srteam2020 commented 3 years ago

@cscetbon Thanks for the follow up. You can close this issue and we will go back if we figure out how to scale the cassandra cluster rack correctly.