Kong / gateway-operator

Kubernetes Operator for Kong Gateways
Apache License 2.0
50 stars 11 forks source link

Quick subsequent reconciliations cause errors caused by using cached client #733

Open pmalek opened 2 days ago

pmalek commented 2 days ago

Problem statement

controller-runtime (use by the operator) uses a cached client.

This causes issues when quick, successive reconciliations of a resource happen and the subsequent iterations do not get an up to date view on an updated/patched object. For example:

Up until now, this has been taken care of by calling one of the Reduce*() helper functions which are aware of the objects' schema and their priorities so that surplus objects can be removed without consequences.

With integration with external APIs (e.g. Konnect API in #370) this does not work as the resource is created in external API and more importantly the create API calls will result in HTTP 409 Conflicts.

Exemplar logs

request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
response:
HTTP/2.0 201 Created
Connection: close
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:14 GMT
Etag: W/"2a5-ebpzw2GSE0sps7ves89bnmx3ey4"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4920
Ratelimit-Reset: 47
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5c13d5952748e800
X-B3-Traceid: 000000000000000039ca0f476805b600
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6634881512632804896
X-Datadog-Trace-Id: 4164157604819744060
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 940
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 941
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4920
X-Xss-Protection: 0

{"id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc","name":"test1","description":"","labels":{"app":"test1","key1":"test1","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","session":"eng-demo","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-group":"konnect.konghq.com","k8s-version":"v1alpha1","k8s-namespace":"default","k8s-generation":"1"},"config":{"control_plane_endpoint":"https://25d04d0909.eu.cp0.konghq.tech","telemetry_endpoint":"https://25d04d0909.eu.tp0.konghq.tech","cluster_type":"CLUSTER_TYPE_CONTROL_PLANE","auth_type":"pinned_client_certs","cloud_gateway":false,"proxy_urls":[]},"created_at":"2024-10-10T12:22:13.692Z","updated_at":"2024-10-10T12:22:13.692Z"}
{"level":"info","ts":"2024-10-10T14:22:14.528+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API complete","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"8fbd8dcf-53a9-47b8-ac2f-790815e089ed","op":"create","duration":"990.068708ms","konnect_id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc"}
{"level":"debug","ts":"2024-10-10T14:22:14.558+0200","logger":"KonnectGatewayControlPlane","msg":"reconciling","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265"}
request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
Error: <nil>
response:
HTTP/2.0 409 Conflict
Content-Length: 169
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/problem+json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:15 GMT
Etag: W/"a9-+hkYkS5+BfEhKzaetYcn4sNF9XU"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4919
Ratelimit-Reset: 46
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5fe637d7c66f5c00
X-B3-Traceid: 00000000000000004793b70701333800
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6910272078133288161
X-Datadog-Trace-Id: 5157667238982137886
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 360
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 361
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4919
X-Xss-Protection: 0

{"status":409,"title":"Conflict","instance":"kong:trace:5157667238982137886","detail":"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists."}
{"level":"error","ts":"2024-10-10T14:22:14.962+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API failed","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265","op":"create","duration":"403.763875ms","error":"failed to create KonnectGatewayControlPlane default/test1: {\"status\":409,\"title\":\"Conflict\",\"instance\":\"kong:trace:5157667238982137886\",\"detail\":\"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists.\"}"}

Proposed solution

Make use of expectations https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go

Related material

https://www.youtube.com/watch?v=wMqzAOp15wo&t=748s

Acceptance criteria

pmalek commented 1 day ago

As a follow up to the conversation on Kubernetes slack: https://kubernetes.slack.com/archives/C02MRBMN00Z/p1728554873783249.

A possible way forward (at least for the resources that are created against an external API like Konnect) we can implement a fallback lookup when 409 Conflict is returned. We could add filtering based on Kubernetes UID so that we're not "adopting" (separate issue #460) resources mapped from elsewhere. This could look like this (example for KonnectGatewayControlPlane):

    var sdkConflictError *sdkkonnecterrs.ConflictError
        if errors.As(err, &sdkConflictError) {
            reqList := operations.ListControlPlanesRequest{
                FilterNameEq: lo.ToPtr(cp.Spec.Name),
                Labels:       lo.ToPtr(KubernetesUIDLabelKey + ":" + string(cp.GetUID())),
            }
            if cp.Spec.ClusterType != nil {
                reqList.FilterClusterTypeEq = lo.ToPtr(string(*cp.Spec.ClusterType))
            }

            respList, err := sdk.ListControlPlanes(ctx, reqList)
            if err != nil {
                return err
            }
            for _, listCP := range respList.ListControlPlanesResponse.Data {
                if listCP.Name != nil && *listCP.Name == req.Name {
                    cp.Status.SetKonnectID(*listCP.ID)
                    break
                }
            }