eclipse-ditto / ditto

Eclipse Ditto™: Digital Twin framework of Eclipse IoT - main repository
https://eclipse.dev/ditto/
Eclipse Public License 2.0
696 stars 230 forks source link

Stabilize Ditto on k8s pod rescheduling and rolling updates #1839

Closed thjaeckle closed 10 months ago

thjaeckle commented 12 months ago

We got feedback via our Gitter chat room that after pod rescheduling, Ditto cluster messaging sometimes gets broken: https://matrix.to/#/!NgApgwRGamBVNEqzNU:gitter.im/$u8zsUQZHfuBIrIp-LdnIta8TajCGMoNVrEXSqxFdw14?via=gitter.im&via=matrix.org&via=othermo.ems.host

Ditto clusting uses Apache Pekko, formerly Akka. I found this blogpost from the Lightbend team that since k8s 1.22 the default behavior to downscale pods changed: https://www.lightbend.com/blog/faster-and-smoother-rolling-updates-for-akka-clusters-in-kubernetes

This is the related Akka Management issue: https://github.com/akka/akka-management/issues/1130

Akka management fixed this issue in 1.3, which we however cannot use because of its license.

So there are a few options IMO to stablisize this in Ditto:

Just to provide some ideas..

A custom operator might also for other reasons be useful, eg in order to rolling update Ditto service by service instead of all at services at once.

@kalinkostashki maybe you also have an idea?

kalinkostashki commented 11 months ago

Hey @thjaeckle,

From what I looked a bit there are several approaches:

  1. Create a per service Helm charts and one wrapper chart like it was in eclipse packages: Benefits:

    • you enable per service upgrades via hooks post-install, pre-upgrade, etc.
    • this should fundamentally solve the issue you are talking about

    Negatives:

    • you create a more complex chart structure which is harder to maintain and will include code duplications I assume
    • Helm hooks and the way they behave may be a problem from the reading that I've done so it would be bad to avoid them -> for example what happens if a post install step fails for some reason...
  2. Create per service Helm charts only Benefits:

    • you enable per service upgrades via manually calling helm upgrade of a service

    • this should fundamentally solve the issue you are talking about

      Negatives:

    • you create a more complex chart structure which is harder to maintain and will for sure cause code duplications I assume

  3. Custom operator that registers CRDs for each service(the thing you proposed): Benefits:

    • no code duplications like previous approaches
    • you enable per service upgrades via manually calling helm upgrade of a service
    • this should fundamentally solve the issue you are talking about
    • due to not knowing the exactly when or how Pekko project will address this we will get some autonomy and flexibility
    • it is shiny and cool ;)

    Negatives:

    • creation of yet another API that has to be supported -> this is kind of innevatible I feel but has to be considered

I personally think contribution to Pekko would harder(at least for my limited experience). The other approaches that you mention like side-cars are also fine, but are just a patch. The disabling of a kubernetes feature like LogarithmicScaleDown I'm not sure is even possible with cloud providers.

Given all that I think the operator is the way to Go(pun intended :)). This will make Ditto more flexible and enable the way for other features like autoscaling based on specific metrics, etc -> sure you can think of more cool stuff :)

thjaeckle commented 11 months ago

@kalinkostashki I fully agree. Kubernetes operator is the way to go. Let's put it on the good resolutions list for 2024. 😁

With that we should be able to query the management Api of Pekko in order to find out the cluster role leader and then down this last during a rolling update.

thjaeckle commented 11 months ago

We could fork https://github.com/lightbend/akka-cluster-operator To eg a new Ditto Github repo.

And adjust to Pekko.. Should not be too hard I hope.

thjaeckle commented 11 months ago

In the meantime, I think we could solve this via a Helm "pre-upgrade" Hook. Asked this stackoverflow question which provides a good "basis": https://stackoverflow.com/questions/77704278/can-a-helm-pre-upgrade-hook-modify-the-pod-deletion-cost-prior-to-doing-a-rollin/77724976#77724976

Idea is to register a "pre-upgrade" Hook to:

This will not yet fix/resolve that we want to update one Ditto service after another instead of all at once.
But maybe this can also be solved with another pre-upgrade hook .. maybe using a lease to store the currently updated service before proceeding to the next?