Split brain using Akka.Cluster.Sharding, Akka.Management, Akka.Discovery.KubernetesApi

garethjames-imburse commented 6 months ago

I’m hoping someone can shed some light on why our Kubernetes deployments are so sensitive to split brains with our Helm chart and Akka configuration the way it is.

The documentation is fairly brief, not always explaining how the various settings work and when to use them, so it's unclear to us if we're following best practices for our deployment scenario.

We have five applications (alpha, beta, charlie, delta, echo) which are deployed from a single Helm chart as stateful sets to Kubernetes. Each stateful set has three replicas. The pods that are created are as follows:

pod-alpha-0
pod-alpha-1
pod-alpha-2
pod-bravo-0
pod-bravo-1
pod-bravo-2
pod-charlie-0
pod-charlie-1
pod-charlie-2
pod-delta-0
pod-delta-1
pod-delta-2
pod-echo-0
pod-echo-1
pod-echo-2

We are using Akka.Cluster.Sharding and Akka.Management + Akka.Discovery.KubernetesApi to form the cluster. This works well generally, except for approximately 3% of the time we end up with a split brain when performing a rolling deployment. This seems like an unusually high percentage and is causing some problems.

The HOCON we were using initially was as follows:

akka {
    cluster {
        downing-provider-class = "Akka.Cluster.SBR.SplitBrainResolverProvider, Akka.Cluster"
        split-brain-resolver {
            active-strategy = keep-majority
        }
    }

    discovery {
        method = "kubernetes-api"
        kubernetes-api {
            class = "Akka.Discovery.KubernetesApi.KubernetesApiServiceDiscovery, Akka.Discovery.KubernetesApi"

            api-ca-path = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
            api-token-path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
            api-service-host-env-name = "KUBERNETES_SERVICE_HOST"
            api-service-port-env-name = "KUBERNETES_SERVICE_PORT"

            pod-namespace-path = "/var/run/secrets/kubernetes.io/serviceaccount/namespace"
            pod-domain = "cluster.local"
            pod-label-selector = "actorsystem={0}"
            use-raw-ip = false
            container-name = ""
        }
    }

    extensions = ["Akka.Management.Cluster.Bootstrap.ClusterBootstrapProvider, Akka.Management"]

    management {
        http {
            port = 8558
            hostname = "" # <- Overridden in Helm chart template with pod IP address as env var
        }
        cluster.bootstrap {
            new-cluster-enabled = on
            contact-point-discovery {
                service-name = "myactorsystem"
                port-name = "management"
                required-contact-point-nr = 2
                stable-margin = 5s
                contact-with-all-contact-points = true
            }
        }
    }
}

Following the section Deployment Considerations from the Akka.Management repo docs, we made the following changes to the configuration:

akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 30s
akka.discovery.kubernetes-api.container-name = "..." # <- Overridden in Helm chart template with container name as env var
akka.management.cluster.bootstrap.new-cluster-enabled=off
akka.management.cluster.bootstrap.contact-point-discovery.stable-margin = 15s

After making these changes, while testing deployments, things appear to work as expected (just as they do most of the time). When being a bit more aggressive and randomly killing a handful of pods, we would often end up with none of the nodes being in a cluster (verified with PBM).

The last adjustments we made were as follows:

akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 30s
akka.discovery.kubernetes-api.container-name = "..." # <- Overridden in Helm chart template with container name as env var
akka.management.cluster.bootstrap.new-cluster-enabled=on
akka.management.cluster.bootstrap.contact-point-discovery.contact-with-all-contact-points = false
akka.management.cluster.bootstrap.contact-point-discovery.required-contact-point-nr = 5
akka.management.cluster.bootstrap.contact-point-discovery.stable-margin = 15s

This seems to have yielded the best results overall, but we're concerned that setting new-cluster-enabled=off has not proved very useful and that we're still vulnerable to split brains during deployment.

Does anyone have any experience and/or advice for similar scenarios using these Akka features?

Aaronontheweb commented 5 months ago

Hi @garethjames-imburse - sorry for the delay on this. Have you run into this problem again since?

garethjames-imburse commented 5 months ago

Hi @Aaronontheweb, we were unable to get anywhere setting new-cluster-enabled=off - very often when killing pods we ended up with no cluster (as we required at least 2 contact points to form one). Instead we have tuned a whole lot of the other settings available with some trial and error, and deployments appear to be stable now.

Aaronontheweb commented 5 months ago

@garethjames-imburse would you mind sharing some of your configuration settings? I'm very interested in seeing if we can reproduce this issue in our test lab at all, since we rely heavily on K8s service discovery there.

garethjames-imburse commented 4 months ago

@Aaronontheweb, apologies for the delay - thank you for inviting us to share our configuration. I've reached out to you separately to discuss further but I'll paste any useful information back here.

akkadotnet / Akka.Management

Split brain using Akka.Cluster.Sharding, Akka.Management, Akka.Discovery.KubernetesApi #2494