caas-team / GoKubeDownscaler

A horizontal autoscaler for Kubernetes workloads
GNU General Public License v3.0
9 stars 4 forks source link

broken ScaledObjects scaling #64

Open jonathan-mayer opened 1 month ago

jonathan-mayer commented 1 month ago

Issue

Workloads managed by ScaledObjects are currently also scaled by the Downscaler. This results in the ScaledObject and the downscaler fighting to downscale those workloads to (possibly) different Replicas.

Further details

Bug was found here: https://github.com/caas-team/py-kube-downscaler/issues/96 Bug was fixed in py-kube-downscaler here: https://github.com/caas-team/py-kube-downscaler/pull/100

Proposal

Implement some way to avoid this. A possible way would be to automatically exclude the workloads managed by ScaledObjects from being scaled by the Downscaler.

Who can address the issue

Other links/references

samuel-esp commented 1 month ago

A possible way would be to automatically exclude the workloads managed by ScaledObjects from being scaled by the Downscaler.

My suggestion to implement the approach you said would be something like:

// GetWorkloads gets all workloads of the specified resources for the specified namespaces
func (c client) GetWorkloads(namespaces []string, resourceTypes []string, ctx context.Context) ([]scalable.Workload, error) {
    var results []scalable.Workload
    if namespaces == nil {
        namespaces = append(namespaces, "")
    }
    for _, namespace := range namespaces {
        for _, resourceType := range resourceTypes {
            slog.Debug("getting workloads from resource type", "resourceType", resourceType)
            getWorkloads, ok := scalable.GetWorkloads[strings.ToLower(resourceType)]
            if !ok {
                return nil, errResourceNotSupported
            }
            workloads, err := getWorkloads(namespace, c.clientsets, ctx)
            if err != nil {
                return nil, fmt.Errorf("failed to get workloads: %w", err)
            }
            results = append(results, workloads...)
        }
    }

    return results, nil
}

This approach requires the ScaledObjects to be processed before Deployments/StatefulSets

Things to consider: clarity of code (right now is perfect), performance (should still be 0(N), but we will iterate many times inside lists that could be pretty big, especially the Deployment list)

jonathan-mayer commented 1 month ago

I'm not sure if putting all this in the GetWorkloads function is right. I will have a look on how to implement it next week as I am still out of office for the rest of this week.