Closed ksgnextuple closed 7 months ago
@ksgnextuple What image and version of Gateway are you running? I believe the problem you're facing has been solved in https://github.com/Kong/kubernetes-ingress-controller/issues/5638 in KIC 3.1.1 (if you're using an open source Kong)
Hi @pmalek
Kong gateway -> kong:3.6 KIC -> kong/kubernetes-ingress-controller:3.1
Have used the helm installation -> helm install kong/kong --generate-name --set ingressController.installCRDs=false -n kong --create-namespace
Upgrading to KIC 3.1.1 should fix your issue.
Even after updating the image to -> kong/kubernetes-ingress-controller:3.1.1. Seeing the same error
Any updates on this?
@ksgnextuple Can you try following this guide https://docs.konghq.com/kubernetes-ingress-controller/latest/reference/troubleshooting/#dumping-generated-kong-configuration to get the configuration that failed to be applied? Specifically the one from /debug/config/failed
.
This way we'll be able to progress with this knowing what config are we working with.
I think the issue is resolved. I deleted the httproute and Gateway and recreated it after upgrading the KIC version to 3.1.1. Will get back here after few tests.
Hmm, on further checking it's still the same, will try the troubleshooting documentation.
kind: HTTPRoute apiVersion: gateway.networking.k8s.io/v1beta1 metadata: name: argo-rollouts-http-route annotations: konghq.com/strip-path: "true" spec: parentRefs:
When I add the 2nd backendref get the error
{ "_format_version": "3.0", "_info": { "select_tags": ["managed-by-ingress-controller"], "defaults": {} }, "services": [ { "connect_timeout": 60000, "host": "httproute.default.argo-rollouts-http-route.0", "id": "346eb9fa-c8fb-53b5-acc7-7488e6fe526e", "name": "httproute.default.argo-rollouts-http-route.0", "port": 80, "protocol": "http", "read_timeout": 60000, "retries": 5, "write_timeout": 60000, "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "routes": [ { "hosts": ["demo.example.com"], "id": "7791108a-f8cb-5441-bdd6-ba897ccefdec", "name": "httproute.default.argo-rollouts-http-route.0.0", "paths": ["~/$", "/"], "path_handling": "v0", "preserve_host": true, "protocols": ["http", "https"], "strip_path": true, "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "https_redirect_status_code": 426, }, ], }, ], "upstreams": [ { "name": "httproute.default.argo-rollouts-http-route.0", "algorithm": "round-robin", "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "targets": [ { "target": "10.244.1.155:8080", "weight": 33 }, { "target": "10.244.1.155:8080", "weight": 0 }, { "target": "10.244.0.40:8080", "weight": 33 }, { "target": "10.244.0.40:8080", "weight": 0 }, { "target": "10.244.0.39:8080", "weight": 33 }, { "target": "10.244.0.39:8080", "weight": 0 }, ], }, ], }
Above is the output of the debug
Are the Services in question both using the same selectors? Do they have the same endpoints if you check kubectl get endpoints
?
I (arbitrarily) tried this with two test Services that were identical in all but name, so I have
$ kubectl get endpoints | grep bin
abcbin 10.244.0.9:80 16m
xyzbin 10.244.0.9:80 16m
That yields
"targets": [
{
"target": "10.244.0.9:80",
"weight": 1
},
{
"target": "10.244.0.9:80",
"weight": 1
}
]
which is rejected with:
{
"message": "declarative config is invalid: {targets={[2]=\"uniqueness violation: targets entity with primary key set to 0500bbf6-db53-52da-ad42-d13937d1e29c already declared\"}}",
"flattened_errors": {},
"code": 14,
"fields": {
"targets": [
null,
"uniqueness violation: targets entity with primary key set to 0500bbf6-db53-52da-ad42-d13937d1e29c already declared"
]
},
"name": "invalid declarative configuration"
}
From the target list in the dumped config here, it roughly looks like that's the same pattern on your side. We presumably need to de-duplicate targets.
On the error front, FTI-5584 was followed by FTI-5813 internally, though it looks like the first should have fixed all uniqueness constraint errors being unreadable--I'm not sure if FTI-5813 is just some other type of un-parseable error. We should probably just log the actual body or generate an Event in our own namespace with a text dump of those.
Let me try it again, but when I actually do the kubectl argo rollout promote, things start to work. That is when the canary actually has an upstream.
I'm also seeing this same issue in what appears to be the same use case as @ksgnextuple, using KIC + Gateway API to enable canary deployments with Argo rollouts. In my environment I have KIC 3.1.1, Kong Gateway 3.6.1.1, and k8s 1.28.
Are the Services in question both using the same selectors?
@rainest Following this Kong + Argo Rollouts example (step 5), yes. I'm not an expert on Argo Rollouts but as I understand it, the rollout controller will inject additional labels in to the Service's selector during a rollout. So during a rollout, the selectors will be different, but before a rollout has started or after it has completed the selectors will be the same.
@ksgnextuple @congiv I've uploaded traines/kic:3.1-targetdedup
as a preliminary image that includes https://github.com/Kong/kubernetes-ingress-controller/pull/5817. I think this should address the Argo issue based on my understanding of the problem.
Can you try the affected rollout with it to confirm if they behave correctly? Is configuration accepted? Do you observe any unexpected distribution of traffic during the rollout?
Thanks @rainest. I tried your image in my environment and it seems to be working. I don't see any error logs about being unable to update the routing config as I did before.
I just tested quickly by having a Rollout with that increases traffic to the canary by 25% every 2 mins, and curling my service every 5s. Not that much data to go off for the weight, but it appears to be within reason for what I'd expect. I can set up a more involved test where I'm sending more traffic to a canary running over a longer period of time to get better info about the weight if that would help.
The only thing that jumped out at me is that I was unable to visualize the progress of traffic shifting using prom metrics from Kong. I thought that I'd be able to look at the rate of kong_http_requests_total
summed by route
, but it looks like there's only a single timeseries returned. I'm guessing that's more of a Kong Gateway thing, but wanted to mention it as it felt relevant to the test. Unsure if that is expected or not.
That's expected. Backends are aggregated under a single route; distribution to the different endpoint sets is handled in the Kong upstream resource attached to that route via the Kong service. Unless you're tracking requests per upstream IP (which AFAIK isn't something that our stock metrics track) you won't see the split.
https://github.com/Kong/kubernetes-ingress-controller/pull/5817 is now merged, so going to go ahead and close this. It's not yet actually released, but will be included in 3.2 (or possibly another 3.1.x patch release--we don't have any immediate plans to release another, but might.
Is there an existing issue for this?
Current Behavior
I have created a httproute object and integrating it with argo rollouts. Once I create the httproute and the argo rollout resource getting the below error in kong ingress controller container logs -
I have not created any plugins or consumers & there is only 1 httproute resource in the entire cluster. Below is the yaml of the resource
Expected Behavior
No response
Steps To Reproduce
No response
Kong Ingress Controller version
Kubernetes version
Anything else?
No response