Closed alexsnaps closed 2 years ago
Just to clarify, if the operator is down and comes back then it will need to reconcile the current state of the RateLimits
with the current state of ConfigMap
instead of sending new limits via an HTTP endpoint.
So the state will be always stored in a ConfigMap
, giving it the authority over the local file in Limitador, right? If that's the case, the "real" state of the Limitador instance is irrelevant when resides within k8s which IMHO makes sense in this particular case... the other way we'd need to sync both sides which might be a bit of a PITA.
Just to clarify, if the operator is down and comes back then it will need to reconcile the current state of the RateLimits with the current state of ConfigMap instead of sending new limits via an HTTP endpoint.
Yes, that would be the idea. The HTTP endpoint will not be used. The limits are always loaded from the local file to the limitador. The operator will manage the content of the local limit using a configmap and updating the config map if needed.
Limits loaded from the local file would not be persisted in the storage where the counters exist. The limits would be always in memory and some background process will monitor that the content of the local file does not change. Should the content change, then the in-memory limit structures would also change.
So the state will be always stored in a ConfigMap, giving it the authority over the local file in Limitador, right?
The configmap is the k8s mechanism to provision the local file in limitador's pod. The authority of the limits belong to the RateLimitPolicies.
If that's the case, the "real" state of the Limitador instance is irrelevant when resides within k8s which IMHO makes sense in this particular case... the other way we'd need to sync both sides which might be a bit of a PITA.
I think this is answered. There is no need to sync both sides. The control plane just makes sure that limitador gets the assigned limits and the source of truth lives in the RateLimitPolicies. If the ratelimitpolicies changes, the control plane will reconcile and update accordingly the config map of limitador's pod.
Let me drop here two ideas related to the authority of the limits:
Currently the limitador operator reads RateLimit CRs and reconciles with limitador using the HTTP endpoint. I would get rid of the RateLimit CRs. The limits are configured at the Limitador CR. For example:
---
apiVersion: limitador.kuadrant.io/v1alpha1
kind: Limitador
metadata:
name: limitador
spec:
replicas: 1
version: "0.4.0"
limits:
- conditions: ["get-toy == yes"]
max_value: 2
namespace: toystore-app
seconds: 30
variables: []
- conditions:
- "admin == yes"
max_value: 2
namespace: toystore-app
seconds: 30
variables: []
- conditions: ["vhaction == yes"]
max_value: 6
namespace: toystore-app
seconds: 30
variables: []
the limitador operator would be responsible of reconciling the content of spec.limits
with the content config map mounted in the limitador's pod.
Who sets Limitador CR's spec.limits
? The one reading the RateLimitPolicy CR's: the kuadrant controll plane (kuadrant-controller). The kuadrant control plane is who assigns limits to limitador deployments. The control plane can assign different limit sets to different limitador deployments. Then each limitador deployment may have N replicas, and those N replicas will get the same limit configuration as it belongs to the Limit CR.
Now all the counters are treated equally, they all live in the defined storage. This can be memory, redis or infinispan.
Since we have decoupled limits from counters, we may decide upon the counter storage type. As counters for 1 sec time period does not seem reasonable to be persisted or even replicated, why not have them in memory regardless of the other counters? So I propose a new attribute of the limit to decide on the counter's storage type. By default it would be the defined storage (memory, redis, infinispan). But you could specify that you want it in-memory
. The limit definition would be:
- conditions: ["vhaction == yes"]
max_value: 6
namespace: toystore-app
seconds: 30
variables: []
type: in-memory
wdyt? It is ok to say: it does not make any sense :)
I would get rid of the RateLimit CRs
Well, the end game is to have these CRs tho, so that application developers can manage these themselves and have their own decoupled lifecycle.
Who sets Limitador CR's spec.limits? The one reading the RateLimitPolicy CR's: the kuadrant controll plane (kuadrant-controller). The kuadrant control plane is who assigns limits to limitador deployments. The control plane can assign different limit sets to different limitador deployments. Then each limitador deployment may have N replicas, and those N replicas will get the same limit configuration as it belongs to the Limit CR.
I like the ConfigMap
approach as it composes nicely from a world where there is no k8s.
So, as we discussed in the sync today, I think I like the idea of having the ConfigMap
being the aggregate of all CRs. The operator would monitor for the RateLimit
and merge them all into the ConfigMap
. Now we've decoupled the necessity for even having limitador up and running when that happens. The ConfigMap
would be mounted to the pod, and Limitador
would load the Limit
s from there on startup. It'd monitor for any changes, so to pick up e.g. new RateLimit
s being added by some app developer, picked up by the operator and added to the ConfigMap
.
In a non k8s world, Limit
would "just" be a single config file on disk. In a "non-managed" k8s world (i.e. no limitador operator present), the same trick of mounting the ConfigMap
would be used. And in a managed environment, only the cluster admin would see that ConfigMap
(in the same namespace as the limitador deployment?), while app developer would only care about the RateLimit
CRs. Looks like this approach composes nicely, wdyt?
The one downside to this approach, is that the ConfigMap
should only be mutated by the operator. If the ConfigMap
is changed... well there are multiple routes we could take: the operator…
Limit <foobar>: disabled
) which then overrides the <foobar>
RateLimit
CRlimitador
CRI don't think we need to decide on any of these right now, but I feel the approach might enable different routes moving forward, more so if we make Limit
s (content) addressable (e.g. sha of their identity fields) as well; and probably find a good status
block for the RateLimit
CRs to reflect these changes back to the operator.
Limit
from the file at start up (and not reconcile with the on-disk state, as they'd be none)RateLimit
CRs in a ConfigMap
mounted for the limitador instance to useI like the proposal of opting out of persistence for certain counters. If we already make it so that limitador persists only certain counters, letting the user opt out of persistence for any arbitrary ones seems like it be easy enough and possibly a good thing wrt throughput management for the user. Would be a nice addition to #69
all the tasks here are ticked wondering is this epic complete?
Currently
Limit
s are "owned" by Limitador. It gets them from a file and then the set ofLimit
can be altered using the REST end points. In a k8s world, that file is a mountedConfigMap
and the end-point is invoked based on the changes made to the Custom Resources, i.e.RateLimitPolicy
'ies. All theseLimit
s are persisted into Redis, when Limitador is configured to use it.The problem with this approach is that we end up with two authorities over
Limit
s, that all have a different lifecycle. e.g. Limitador is up & running, loaded an initial set ofLimit
s from theConfigMap
as it was when it started. If thatConfigMap
is changed, Limitador will only pick up those changes when restarted. In the meantime, someRateLimitPolicy
'ies are being added and the operator changes the config using the REST endpoints on Limitador. For some reason Limitador is down and gets restarted. Currently, only theLimit
s from the (whatever state it is in)ConfigMap
are being kept. So for theCounter
s associated with them. All the otherLimit
s are discarded.Fixing this
While we could "simply" make the operator smarter and "work around" these shortcomings, some of us believe that clearly defining a single authoritative source for the
Limit
s would make the system simpler to manage. After discussing this with @eguzki , we thought a forward would be to not persistLimit
s in Redis anymore but to only keep them in-memory. They'd be loaded from a single config file from "disk" at start up, and changes to that file would be picked up by Limitador without the need for a restart. In a k8s world, the control plane would be responsible for updating that file accordingly, based of changes to Custom Resources and otherConfigMap
it'd care about. That way the lifecycle of the different components (the k8s cluster(s), Limitador & Redis) become decoupled & changes are applied consistently.A word on sharding
Interestingly this comes with an additional advantage. If we need to shard the workload across multiple Limitador instances, how that sharding happens is now entirely controlled by the control plane: how to configure Limitador instances and how gateway(s) get to query those.
Todo