Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.22k stars 592 forks source link

Could not unmarshal config error #5676

Closed ksgnextuple closed 7 months ago

ksgnextuple commented 8 months ago

Is there an existing issue for this?

Current Behavior

I have created a httproute object and integrating it with argo rollouts. Once I create the httproute and the argo rollout resource getting the below error in kong ingress controller container logs -

2024-03-04T16:15:28Z    error   Failed parsing resource errors  {"url": "https://localhost:8444", "update_strategy": "InMemory", "error": "could not unmarshal config error: json: cannot unmarshal object into Go struct field ConfigError.flattened_errors of type []sendconfig.FlatEntityError"}

I have not created any plugins or consumers & there is only 1 httproute resource in the entire cluster. Below is the yaml of the resource

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
  name: argo-rollouts-http-route
  annotations:
    konghq.com/strip-path: "true"
spec:
  parentRefs:
  - kind: Gateway
    name: kong
    namespace: default
  hostnames:
  - "demo.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: argo-rollouts-stable-service
      kind: Service
      port: 80
    - name: argo-rollouts-canary-service
      kind: Service
      port: 80

Expected Behavior

No response

Steps To Reproduce

No response

Kong Ingress Controller version

3.1

Kubernetes version

1.27

Anything else?

No response

pmalek commented 8 months ago

@ksgnextuple What image and version of Gateway are you running? I believe the problem you're facing has been solved in https://github.com/Kong/kubernetes-ingress-controller/issues/5638 in KIC 3.1.1 (if you're using an open source Kong)

ksgnextuple commented 8 months ago

Hi @pmalek

Kong gateway -> kong:3.6 KIC -> kong/kubernetes-ingress-controller:3.1

Have used the helm installation -> helm install kong/kong --generate-name --set ingressController.installCRDs=false -n kong --create-namespace

pmalek commented 8 months ago

Upgrading to KIC 3.1.1 should fix your issue.

ksgnextuple commented 8 months ago

Even after updating the image to -> kong/kubernetes-ingress-controller:3.1.1. Seeing the same error

ksgnextuple commented 8 months ago

Any updates on this?

pmalek commented 8 months ago

@ksgnextuple Can you try following this guide https://docs.konghq.com/kubernetes-ingress-controller/latest/reference/troubleshooting/#dumping-generated-kong-configuration to get the configuration that failed to be applied? Specifically the one from /debug/config/failed.

This way we'll be able to progress with this knowing what config are we working with.

ksgnextuple commented 8 months ago

I think the issue is resolved. I deleted the httproute and Gateway and recreated it after upgrading the KIC version to 3.1.1. Will get back here after few tests.

ksgnextuple commented 8 months ago

Hmm, on further checking it's still the same, will try the troubleshooting documentation.

ksgnextuple commented 8 months ago

kind: HTTPRoute apiVersion: gateway.networking.k8s.io/v1beta1 metadata: name: argo-rollouts-http-route annotations: konghq.com/strip-path: "true" spec: parentRefs:

When I add the 2nd backendref get the error

ksgnextuple commented 8 months ago

{ "_format_version": "3.0", "_info": { "select_tags": ["managed-by-ingress-controller"], "defaults": {} }, "services": [ { "connect_timeout": 60000, "host": "httproute.default.argo-rollouts-http-route.0", "id": "346eb9fa-c8fb-53b5-acc7-7488e6fe526e", "name": "httproute.default.argo-rollouts-http-route.0", "port": 80, "protocol": "http", "read_timeout": 60000, "retries": 5, "write_timeout": 60000, "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "routes": [ { "hosts": ["demo.example.com"], "id": "7791108a-f8cb-5441-bdd6-ba897ccefdec", "name": "httproute.default.argo-rollouts-http-route.0.0", "paths": ["~/$", "/"], "path_handling": "v0", "preserve_host": true, "protocols": ["http", "https"], "strip_path": true, "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "https_redirect_status_code": 426, }, ], }, ], "upstreams": [ { "name": "httproute.default.argo-rollouts-http-route.0", "algorithm": "round-robin", "tags": [ "k8s-name:argo-rollouts-http-route", "k8s-namespace:default", "k8s-kind:HTTPRoute", "k8s-uid:d0018e4d-53cd-44a0-9d55-3d6ce03f347f", "k8s-group:gateway.networking.k8s.io", "k8s-version:v1", ], "targets": [ { "target": "10.244.1.155:8080", "weight": 33 }, { "target": "10.244.1.155:8080", "weight": 0 }, { "target": "10.244.0.40:8080", "weight": 33 }, { "target": "10.244.0.40:8080", "weight": 0 }, { "target": "10.244.0.39:8080", "weight": 33 }, { "target": "10.244.0.39:8080", "weight": 0 }, ], }, ], } Above is the output of the debug

rainest commented 7 months ago

Are the Services in question both using the same selectors? Do they have the same endpoints if you check kubectl get endpoints?

I (arbitrarily) tried this with two test Services that were identical in all but name, so I have

$ kubectl get endpoints | grep bin
abcbin                        10.244.0.9:80                       16m
xyzbin                        10.244.0.9:80                       16m

That yields

      "targets": [
        {
          "target": "10.244.0.9:80",
          "weight": 1
        },
        {
          "target": "10.244.0.9:80",
          "weight": 1
        }
      ]

which is rejected with:

{                      
  "message": "declarative config is invalid: {targets={[2]=\"uniqueness violation: targets entity with primary key set to 0500bbf6-db53-52da-ad42-d13937d1e29c already declared\"}}",
  "flattened_errors": {},
  "code": 14,
  "fields": {
    "targets": [
      null,
      "uniqueness violation: targets entity with primary key set to 0500bbf6-db53-52da-ad42-d13937d1e29c already declared"
    ]
  },
  "name": "invalid declarative configuration"
}

From the target list in the dumped config here, it roughly looks like that's the same pattern on your side. We presumably need to de-duplicate targets.

On the error front, FTI-5584 was followed by FTI-5813 internally, though it looks like the first should have fixed all uniqueness constraint errors being unreadable--I'm not sure if FTI-5813 is just some other type of un-parseable error. We should probably just log the actual body or generate an Event in our own namespace with a text dump of those.

ksgnextuple commented 7 months ago

Let me try it again, but when I actually do the kubectl argo rollout promote, things start to work. That is when the canary actually has an upstream.

congiv commented 7 months ago

I'm also seeing this same issue in what appears to be the same use case as @ksgnextuple, using KIC + Gateway API to enable canary deployments with Argo rollouts. In my environment I have KIC 3.1.1, Kong Gateway 3.6.1.1, and k8s 1.28.

Are the Services in question both using the same selectors?

@rainest Following this Kong + Argo Rollouts example (step 5), yes. I'm not an expert on Argo Rollouts but as I understand it, the rollout controller will inject additional labels in to the Service's selector during a rollout. So during a rollout, the selectors will be different, but before a rollout has started or after it has completed the selectors will be the same.

rainest commented 7 months ago

@ksgnextuple @congiv I've uploaded traines/kic:3.1-targetdedup as a preliminary image that includes https://github.com/Kong/kubernetes-ingress-controller/pull/5817. I think this should address the Argo issue based on my understanding of the problem.

Can you try the affected rollout with it to confirm if they behave correctly? Is configuration accepted? Do you observe any unexpected distribution of traffic during the rollout?

congiv commented 7 months ago

Thanks @rainest. I tried your image in my environment and it seems to be working. I don't see any error logs about being unable to update the routing config as I did before.

I just tested quickly by having a Rollout with that increases traffic to the canary by 25% every 2 mins, and curling my service every 5s. Not that much data to go off for the weight, but it appears to be within reason for what I'd expect. I can set up a more involved test where I'm sending more traffic to a canary running over a longer period of time to get better info about the weight if that would help.

The only thing that jumped out at me is that I was unable to visualize the progress of traffic shifting using prom metrics from Kong. I thought that I'd be able to look at the rate of kong_http_requests_total summed by route, but it looks like there's only a single timeseries returned. I'm guessing that's more of a Kong Gateway thing, but wanted to mention it as it felt relevant to the test. Unsure if that is expected or not.

rainest commented 7 months ago

That's expected. Backends are aggregated under a single route; distribution to the different endpoint sets is handled in the Kong upstream resource attached to that route via the Kong service. Unless you're tracking requests per upstream IP (which AFAIK isn't something that our stock metrics track) you won't see the split.

rainest commented 7 months ago

https://github.com/Kong/kubernetes-ingress-controller/pull/5817 is now merged, so going to go ahead and close this. It's not yet actually released, but will be included in 3.2 (or possibly another 3.1.x patch release--we don't have any immediate plans to release another, but might.