[Investigation] RLP defaults and how we would apply them

maleck13 commented 2 years ago

Use Case

As a gateway administrator, I want to be able to set a default RLP that will apply to any host handled by this gateway so that there is at least some basic RL in place but I want to allow this default to be overriden by service providers lower down the chain.

The focus here is on investigation rather than a long lasting implementation. Our investigation should result in a better understanding of how this can work and whether it is a good fit

The idea of a default is that is can be set at the top but redefined at lower levels. An example of this would be setting RL at the vhost level.

If I apply a default RLP at the Gateway level, it should apply to all virtual hosts handled by that gateway and apply to a the last route (IE a catch all /* type route).

If I apply a RLP against a HTTPRoute and I create a virtual host level rate limiting rule the rule applied at the gateway should be replaced with my rule.

When I create a RLP that is targeting a HTTPRoute, if there is a gateway level default and I haven't set my own value for the default, the default should be reflected into my RLP so that I can clearly see what is being applied

A key use case is being able to provide governance over what service providers can and cannot do when exposing a service via a shared ingress gateway. As well as providing certainty that no service is exposed without my ability as a cluster administrator to protect my infrastructure from unplanned load from badly behaving clients etc.

As a cluster administrator providing a shared ingress gateway, I want to be able to define a sane default rate limit policy that for any service exposed via this gateway so that I can protect the infrastructure behind this gateway from malicious / accidental users and ensure that no one service is consuming all of the resources. As a cluster administrator I also undertand that services may need to override these defaults to meet their own sepcific service needs and I want to enable these service providers to do this while being safe in the knowledge that if they do not define their own rate limting my infrastrucure is still protected.

Outcome

Better understanding of how we want to apply defaults with RLP
Basic experimental implementation
Follow on issues to harden, improve our experimental implementation

example

apiVersion: apim.kuadrant.io/v1alpha1
kind: RateLimitPolicy
metadata:
  name: toystore
spec:
  targetRef:
    Group: gateway.networking.k8s.io/v1alpha2
    Kind: Gateway
    Name: my-gateway
  default:
    rateLimits:
    - stage: PREAUTH
      actions:
        - generic_key:
            descriptor_key: vhaction
            descriptor_value: "yes"
    limits:
    - conditions: ["vhaction == yes"]
      max_value: 6
      namespace: gateway
      seconds: 30
      variables: []

If I create a RLP without setting values for the default, my RLP would look like:

apiVersion: apim.kuadrant.io/v1alpha1
kind: RateLimitPolicy
metadata:
  name: toystore
spec:
  targetRef:
    Group: gateway.networking.k8s.io/v1alpha2
    Kind: HTTPRoute
    Name: my-route
  rules:
    - operations:
        - paths: ["/toy"]
          methods: ["GET"]
      rateLimits:
        - stage: BOTH
          actions:
            - generic_key:
                descriptor_key: get-toy
                descriptor_value: "yes"
  rateLimits:
  - stage: PREAUTH
    actions:
      - generic_key:
          descriptor_key: vhaction
          descriptor_value: "yes"
  limits:
   - conditions: ["get-toy == yes"]
      max_value: 2
      namespace: toystore-app
      seconds: 30
      variables: []
  - conditions: ["vhaction == yes"]
    max_value: 6
    namespace: gateway
    seconds: 30
    variables: []

At the HTTPRoute level I would then be able to specify my own values for this default:

apiVersion: apim.kuadrant.io/v1alpha1
kind: RateLimitPolicy
metadata:
  name: toystore
spec:
  targetRef:
    Group: gateway.networking.k8s.io/v1alpha2
    Kind: HTTPRoute
    Name: my-route
  rateLimits:
  - stage: PREAUTH
    name: global-default // new property to identify the default should be kept the same to update to your own default
    actions:
      - generic_key:
          descriptor_key: vhaction
          descriptor_value: "yes"
  limits:
  - conditions: ["vhaction == yes"]
    name: global-default // new property to identify the default should be kept the same to update to your own default
    max_value: 20
    namespace: gateway
    seconds: 30
    variables: []

There is complexity here. When a default is set, in order to override you need to keep the structure and only change the values. Perhaps we need to consider a name or id

maleck13 commented 2 years ago

@rahulanand16nov @eguzki would love any thoughts or feedback you have on this

eguzki commented 2 years ago

Proposal for basic experimental implementation: https://github.com/Kuadrant/kuadrant-controller/issues/129#issuecomment-1125736176

eguzki commented 2 years ago

After some discussions we have decided to not implement (for now) policy attachment defaults for the rate limiting use case. The semantics are complex to understand (and implement) for both the configuration at the gateway level (actions) and configuration at the external rate limiting service (limitador).

Both configuration aspects of the rate limiting, gateway's actions and external service limits, are based on list of objects. Applying defaults semantics is not straightforward for those lists of objects. Especially trying to define merging strategies.

One use case that shows how confusing and unexpected for the end user would be exposing defaults semantics:

One cluster operator adds rate limit at example.com to 100 requests/sec in the defaults section
One application owner adds rate limit of 1000 request/sec for some traffic at example.com in the defaults section

The cluster operator adds some default rate limit expecting that it can be changed, but always in more restrictive way, not increasing the limits. Otherwise it does not make any sense. Nothing prevents the application owner to increase the limits.

eguzki commented 2 years ago

@maleck13 added use cases from https://hackmd.io/IKEYD6NrSzuGQG1nVhwbcw?both here as that document will be deprecated

Kuadrant / kuadrant-controller

[Investigation] RLP defaults and how we would apply them #124