Open MSevey opened 4 weeks ago
Ah, we should just use this: https://pkg.go.dev/k8s.io/client-go/util/retry#OnError
example from chatgpt
package k8s
import (
"context"
"time"
"github.com/sirupsen/logrus"
appv1 "k8s.io/api/apps/v1"
v1 "k8s.io/api/core/v1"
apierrs "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/util/retry"
)
const maxRetries = 3
const retryDelay = 1 * time.Second
var retryRatelimitFn = func(err error) bool {
apiErr, ok := err.(apierrs.APIStatus)
if !ok {
return false // Not a Kubernetes API error
}
// Retry only if rate limit exceeded error
return apiErr.Status().Reason == metav1.StatusReasonTooManyRequests
}
func (c *Client) GetDaemonSet(ctx context.Context, name string) (*appv1.DaemonSet, error) {
var ds *appv1.DaemonSet
err := retry.OnError(
retry.DefaultBackoff,
retryRatelimitFn,
func() error {
var err error
ds, err = c.clientset.AppsV1().DaemonSets(c.namespace).Get(ctx, name, metav1.GetOptions{})
return err
},
)
if err != nil {
return nil, ErrGettingDaemonset.WithParams(name).Wrap(err)
}
return ds, nil
}
Overview
Currently large Knuu tests are limited by the kubernetes global rate limiter. In order to enable larger knuu tests, knuu needs to be able to gracefully handle rate limiting.
Options
Considerations
Option 2 is probably the most simple and low touch. For this option we wouldn't need to even care about what the rate limit is, we just need to have a k8 request timeout variable and/or k8 max retry variable that is used around the k8 client set calls.
This could then naively just be applied around all calls as a POC to verify this works.
A step 2 could then be to create some extractions, either as option 1 or something similar that allows for a single interface for managing the retry logic.
Testing
If we go with option 1, the ratelimiter itself can be thoroughly unit tested.
In all cases, the k8 mocking can be used to mock requests returning the ratelimit error to ensure the error is handled gracefully.