Kuadrant / kuadrant-operator

The Operator to install and manage the lifecycle of the Kuadrant components deployments.
Apache License 2.0
40 stars 33 forks source link

Expose the option for Kuadrant to be installed in HA mode #798

Open maleck13 opened 3 months ago

maleck13 commented 3 months ago

What

When you install Kuadrant, it defaults to a single instance of Authorino and Limitador. In order to be resilient to failure, some installations may want multiple instances of Authorino and Limitador deployed as these components are in the critical path for requests. As Authorino and Limitador support having multiple instances deployed on the same cluster, we should expose options for this deployment topology to be used via Kuadrant

At a high level the key things to expose would be the number of replicas of each that you want and how you want them distributed. Kubernetes supports distributing instances of a pod via topology constraints https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#topology-spread-constraint-examples

use cases:

Support a multi-az k8s cluster where I want to have a gateway instance per AZ that routes to an Authorino and Limitador in the same AZ I want to spread out instances of Authorino and Limitador across AZs rather than allowing them to be potentially scheduled to the same AZ or even the same node. Allow me to have more than one instance of these components per AZ / per cluster for redundancy and to improve resiliency against node failure and AZ failure.

One concept for Kuadrant CRD:

deployment:
   limitador | ratelimiting:
     replicas: 3
     topologyKey: zone
   authorino | auth:
     replicas: 3
     topologyKey: zone

We may also want to consider a simpler level of configuration:

deployment:
   mode: HA
   topologyKey: zone | node

In Authorino and Limitador we would need to add (just an example):

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: limitador

Done

maleck13 commented 3 months ago

@didierofrivia @guicassolato @alexsnaps FYI interested in thoughts / counter approaches

maleck13 commented 3 months ago

Open Question:

With this configuration, would we also force a redisConfig to be set?

alexsnaps commented 3 months ago

I'm a bit confused by the line of thinking here... Does that mean a "HA deployment" of the gateways as well? Limitador would need to share counters as well, one way would be a shared redis... but how is that HA then? Or do we consider that "out of scope"? Or would disk persistence be enough? If so how "HA is HA"? The CRDT in memory based counters could be used to share across on the replicas... but that's out of scope for this release.

Boomatang commented 3 months ago

Is the "simpler level of configuration" the correct why to approach this, as it would more inline with what you suggested with in the RFC: Observability API PR and we are wanting to create a unified API.