celestiaorg / knuu

Integration Test Framework
Apache License 2.0
38 stars 31 forks source link

Knuu rate limiter #411

Open MSevey opened 4 weeks ago

MSevey commented 4 weeks ago

Overview

Currently large Knuu tests are limited by the kubernetes global rate limiter. In order to enable larger knuu tests, knuu needs to be able to gracefully handle rate limiting.

Options

  1. Add an internal ratelimiter to the knuu k8 controller that throttles requests to keep under the global rate limit
  2. Add retry mechanisms with a timeout for requests that fail due to rate limiting.
  3. Some combination

Considerations

Option 2 is probably the most simple and low touch. For this option we wouldn't need to even care about what the rate limit is, we just need to have a k8 request timeout variable and/or k8 max retry variable that is used around the k8 client set calls.

This could then naively just be applied around all calls as a POC to verify this works.

A step 2 could then be to create some extractions, either as option 1 or something similar that allows for a single interface for managing the retry logic.

Testing

If we go with option 1, the ratelimiter itself can be thoroughly unit tested.

In all cases, the k8 mocking can be used to mock requests returning the ratelimit error to ensure the error is handled gracefully.

MSevey commented 4 weeks ago

Ah, we should just use this: https://pkg.go.dev/k8s.io/client-go/util/retry#OnError

example from chatgpt

package k8s

import (
    "context"
    "time"

    "github.com/sirupsen/logrus"
    appv1 "k8s.io/api/apps/v1"
    v1 "k8s.io/api/core/v1"
    apierrs "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/util/retry"
)

const maxRetries = 3
const retryDelay = 1 * time.Second
var retryRatelimitFn = func(err error) bool {
            apiErr, ok := err.(apierrs.APIStatus)
            if !ok {
                return false // Not a Kubernetes API error
            }
            // Retry only if rate limit exceeded error
            return apiErr.Status().Reason == metav1.StatusReasonTooManyRequests
        }

func (c *Client) GetDaemonSet(ctx context.Context, name string) (*appv1.DaemonSet, error) {
    var ds *appv1.DaemonSet
    err := retry.OnError(
        retry.DefaultBackoff,
        retryRatelimitFn,
        func() error {
            var err error
            ds, err = c.clientset.AppsV1().DaemonSets(c.namespace).Get(ctx, name, metav1.GetOptions{})
            return err
        },
    )
    if err != nil {
        return nil, ErrGettingDaemonset.WithParams(name).Wrap(err)
    }
    return ds, nil
}