alexsnaps commented 2 years ago

Currently Limits are "owned" by Limitador. It gets them from a file and then the set of Limit can be altered using the REST end points. In a k8s world, that file is a mounted ConfigMap and the end-point is invoked based on the changes made to the Custom Resources, i.e. RateLimitPolicy'ies. All these Limits are persisted into Redis, when Limitador is configured to use it.

The problem with this approach is that we end up with two authorities over Limits, that all have a different lifecycle. e.g. Limitador is up & running, loaded an initial set of Limits from the ConfigMap as it was when it started. If that ConfigMap is changed, Limitador will only pick up those changes when restarted. In the meantime, some RateLimitPolicy'ies are being added and the operator changes the config using the REST endpoints on Limitador. For some reason Limitador is down and gets restarted. Currently, only the Limits from the (whatever state it is in) ConfigMap are being kept. So for the Counters associated with them. All the other Limits are discarded.

Fixing this

While we could "simply" make the operator smarter and "work around" these shortcomings, some of us believe that clearly defining a single authoritative source for the Limits would make the system simpler to manage. After discussing this with @eguzki , we thought a forward would be to not persist Limits in Redis anymore but to only keep them in-memory. They'd be loaded from a single config file from "disk" at start up, and changes to that file would be picked up by Limitador without the need for a restart. In a k8s world, the control plane would be responsible for updating that file accordingly, based of changes to Custom Resources and other ConfigMap it'd care about. That way the lifecycle of the different components (the k8s cluster(s), Limitador & Redis) become decoupled & changes are applied consistently.

A word on sharding

Interestingly this comes with an additional advantage. If we need to shard the workload across multiple Limitador instances, how that sharding happens is now entirely controlled by the control plane: how to configure Limitador instances and how gateway(s) get to query those.

Todo

[x] #78
[x] Load the Limit from the file at start up (and not reconcile with the on-disk state, as they'd be none)
[x] #79
[x] Kuadrant/limitador-operator#21

rahulanand16nov commented 2 years ago

Just to clarify, if the operator is down and comes back then it will need to reconcile the current state of the RateLimits with the current state of ConfigMap instead of sending new limits via an HTTP endpoint.

didierofrivia commented 2 years ago

So the state will be always stored in a ConfigMap, giving it the authority over the local file in Limitador, right? If that's the case, the "real" state of the Limitador instance is irrelevant when resides within k8s which IMHO makes sense in this particular case... the other way we'd need to sync both sides which might be a bit of a PITA.

eguzki commented 2 years ago

Just to clarify, if the operator is down and comes back then it will need to reconcile the current state of the RateLimits with the current state of ConfigMap instead of sending new limits via an HTTP endpoint.

Yes, that would be the idea. The HTTP endpoint will not be used. The limits are always loaded from the local file to the limitador. The operator will manage the content of the local limit using a configmap and updating the config map if needed.

Limits loaded from the local file would not be persisted in the storage where the counters exist. The limits would be always in memory and some background process will monitor that the content of the local file does not change. Should the content change, then the in-memory limit structures would also change.

eguzki commented 2 years ago

So the state will be always stored in a ConfigMap, giving it the authority over the local file in Limitador, right?

The configmap is the k8s mechanism to provision the local file in limitador's pod. The authority of the limits belong to the RateLimitPolicies.

If that's the case, the "real" state of the Limitador instance is irrelevant when resides within k8s which IMHO makes sense in this particular case... the other way we'd need to sync both sides which might be a bit of a PITA.

I think this is answered. There is no need to sync both sides. The control plane just makes sure that limitador gets the assigned limits and the source of truth lives in the RateLimitPolicies. If the ratelimitpolicies changes, the control plane will reconcile and update accordingly the config map of limitador's pod.

eguzki commented 2 years ago

Let me drop here two ideas related to the authority of the limits:

Source of the config map

Currently the limitador operator reads RateLimit CRs and reconciles with limitador using the HTTP endpoint. I would get rid of the RateLimit CRs. The limits are configured at the Limitador CR. For example:

---                                                       
apiVersion: limitador.kuadrant.io/v1alpha1                
kind: Limitador                                           
metadata:                                                 
  name: limitador                                         
spec:                                                     
  replicas: 1                                             
  version: "0.4.0"     
  limits:
  - conditions: ["get-toy == yes"]
    max_value: 2
    namespace: toystore-app
    seconds: 30
    variables: []
  - conditions:
    - "admin == yes"
    max_value: 2
    namespace: toystore-app
    seconds: 30
    variables: []
  - conditions: ["vhaction == yes"]
    max_value: 6
    namespace: toystore-app
    seconds: 30
    variables: []

the limitador operator would be responsible of reconciling the content of spec.limits with the content config map mounted in the limitador's pod.

Who sets Limitador CR's spec.limits? The one reading the RateLimitPolicy CR's: the kuadrant controll plane (kuadrant-controller). The kuadrant control plane is who assigns limits to limitador deployments. The control plane can assign different limit sets to different limitador deployments. Then each limitador deployment may have N replicas, and those N replicas will get the same limit configuration as it belongs to the Limit CR.

Rate limit counter TYPE

Now all the counters are treated equally, they all live in the defined storage. This can be memory, redis or infinispan.

Since we have decoupled limits from counters, we may decide upon the counter storage type. As counters for 1 sec time period does not seem reasonable to be persisted or even replicated, why not have them in memory regardless of the other counters? So I propose a new attribute of the limit to decide on the counter's storage type. By default it would be the defined storage (memory, redis, infinispan). But you could specify that you want it in-memory. The limit definition would be:

  - conditions: ["vhaction == yes"]
    max_value: 6
    namespace: toystore-app
    seconds: 30
    variables: []  
    type: in-memory

wdyt? It is ok to say: it does not make any sense :)

alexsnaps commented 2 years ago

I would get rid of the RateLimit CRs

Well, the end game is to have these CRs tho, so that application developers can manage these themselves and have their own decoupled lifecycle.

Who sets Limitador CR's spec.limits? The one reading the RateLimitPolicy CR's: the kuadrant controll plane (kuadrant-controller). The kuadrant control plane is who assigns limits to limitador deployments. The control plane can assign different limit sets to different limitador deployments. Then each limitador deployment may have N replicas, and those N replicas will get the same limit configuration as it belongs to the Limit CR.

I like the ConfigMap approach as it composes nicely from a world where there is no k8s.

So, as we discussed in the sync today, I think I like the idea of having the ConfigMap being the aggregate of all CRs. The operator would monitor for the RateLimit and merge them all into the ConfigMap. Now we've decoupled the necessity for even having limitador up and running when that happens. The ConfigMap would be mounted to the pod, and Limitador would load the Limits from there on startup. It'd monitor for any changes, so to pick up e.g. new RateLimits being added by some app developer, picked up by the operator and added to the ConfigMap.

In a non k8s world, Limit would "just" be a single config file on disk. In a "non-managed" k8s world (i.e. no limitador operator present), the same trick of mounting the ConfigMap would be used. And in a managed environment, only the cluster admin would see that ConfigMap (in the same namespace as the limitador deployment?), while app developer would only care about the RateLimit CRs. Looks like this approach composes nicely, wdyt?

The one downside to this approach, is that the ConfigMap should only be mutated by the operator. If the ConfigMap is changed... well there are multiple routes we could take: the operator…

just puts it back into its "correct" state by re-adding all CRs, deleting the rest.
allows for overrides (e.g. Limit <foobar>: disabled) which then overrides the <foobar> RateLimit CR
allows for additions, but not removals
add "default limits" from the limitador CR
...

I don't think we need to decide on any of these right now, but I feel the approach might enable different routes moving forward, more so if we make Limits (content) addressable (e.g. sha of their identity fields) as well; and probably find a good status block for the RateLimit CRs to reflect these changes back to the operator.

proposed solution

[x] Make anything but counters ephemeral in limitador (i.e. not persistent, other than in-memory)
[x] Load the Limit from the file at start up (and not reconcile with the on-disk state, as they'd be none)
[x] Monitor the config file for changes at runtime
[x] Have the operator reconcile RateLimit CRs in a ConfigMap mounted for the limitador instance to use

alexsnaps commented 2 years ago

Rate limit counter TYPE

I like the proposal of opting out of persistence for certain counters. If we already make it so that limitador persists only certain counters, letting the user opt out of persistence for any arbitrary ones seems like it be easy enough and possibly a good thing wrt throughput management for the user. Would be a nice addition to #69

maleck13 commented 2 years ago

all the tasks here are ticked wondering is this epic complete?

Kuadrant / limitador

Redefine authority of `Limit`s #74

Fixing this

A word on sharding

Todo

Source of the config map

Rate limit counter TYPE

proposed solution

Rate limit counter TYPE