Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.16k stars 592 forks source link

KongPlugin's status.PROGRAMMED flapping(true, false) in multiple ingress class #4578

Open parkjeongryul opened 10 months ago

parkjeongryul commented 10 months ago

Is there an existing issue for this?

Current Behavior

Our cluster have two ingress controller (ingress class). And we upgraded to latest ingress controller version.

we noticed that the PROGRAMMED status of kongplugin keeps changing. This looks like a conflict between the two ingress controllers.

$ k get kongplugin -o yaml | grep status: -A7
  status:
    conditions:
    - lastTransitionTime: "2023-08-28T11:16:38Z"
      message: Object is pending configuration in Kong.
      observedGeneration: 1
      reason: Pending
      status: "False"
      type: Programmed
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
$ k get kongplugin -o yaml | grep status: -A6
  status:
    conditions:
    - lastTransitionTime: "2023-08-28T11:16:42Z"
      message: Object was successfully configured in Kong.
      observedGeneration: 1
      reason: Programmed
      status: "True"
      type: Programmed

Logs

time="2023-08-28T15:00:00+09:00" level=info msg="status update not needed" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="reconciling resource" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="updating programmed condition status" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=error msg="Reconciler error" error="Operation cannot be fulfilled on kongplugins.configuration.konghq.com \"MY_RESOURCE\": the object has been modified; please apply your changes to the latest version and try again" logger=controllers.KongPlugin reconcileID="\"5fb60677-7c74-4be8-bad8-5ab4213ca78d\""
time="2023-08-28T15:00:00+09:00" level=info msg="reconciling resource" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="updating programmed condition status" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=error msg="Reconciler error" error="Operation cannot be fulfilled on kongplugins.configuration.konghq.com \"MY_RESOURCE\": the object has been modified; please apply your changes to the latest version and try again" logger=controllers.KongPlugin reconcileID="\"796f4a0b-9ff8-4d9b-81cc-1398a3cec8b3\""
time="2023-08-28T15:00:00+09:00" level=info msg="reconciling resource" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="updating programmed condition status" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="status update not needed" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="reconciling resource" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="updating programmed condition status" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE
time="2023-08-28T15:00:00+09:00" level=info msg="status update not needed" KongV1KongPlugin="{\"Namespace\":\"MY_NAMESPACE\",\"Name\":\"MY_RESOURCE\"}" logger=controllers.KongPlugin name=MY_RESOURCE namespace=MY_NAMESPACE

This started to put a heavy load on the kube apiserver as well.

image

Expected Behavior

The kongplugin resource should not be changed even when there are multiple ingress classes.

Steps To Reproduce

1. Create two ingress(gateway) controller
2. Make gateway, gatewayclass, httproute, kongplugin

Kong Ingress Controller version

- helm charts" https://github.com/Kong/charts/tree/kong-2.26.4
- kubernetes-ingress-controller version: 2.11.0
- kong version: 3.3.1

Kubernetes version

v1.23.15

Anything else?

No response

parkjeongryul commented 10 months ago

I believe #4579 will resolve it.

gallolp commented 10 months ago

Could this new status field be the cause of these log messages I've been receiving since the update to KIC 2.11.0?

time="2023-08-28T14:23:05Z" level=error msg="Reconciler error" error="Operation cannot be fulfilled on kongconsumers.configuration.konghq.com \"admin-key-auth\": the object has been modified; please apply your changes to the latest version and try again" logger=controllers.KongConsumer reconcileID="\"11953b3a-632c-46a0-8bde-f3a73acfce43\""

I do have two ingress classes in my test cluster but I do not see any messages regarding "updating programmed condition status". The other one is Traefik and should not be touching any of the Kong CRDs.

Downgrading to KIC 2.10 removes these messages completely.

Edit: Resources are managed by ArgoCD so resourceVersion and other fields may change with each Argo Sync.

parkjeongryul commented 10 months ago

@gallolp My issue is related to kongplugin. I think it's a different issue if it's coming from kongconsumer.

Can you show me the full log and helm chart version?

gallolp commented 10 months ago

Thanks @parkjeongryul I'm using the same chart version as in the issue description: 2.26.4. When installed with default versions (GW 3.3 / KIC 2.11) the ingress-controller produces messages as shown above for each Kong CR. When installed with KIC 2.10 (change the tag in the value file) the messages are not produced anymore. I will have to upgrade my test environment again to get the full log.

rainest commented 10 months ago

@gallolp there's a similar change to KongConsumer, but those are already filtered on class. It shouldn't be due to other controllers fighting over it. If you run watch kubectl get kongconsumer <name> -oyaml, can you spot what's changing? I can replicate the plugin issue, but not that.

That sort of conflict is somewhat expected--resources can change while a controller is attempting to update them, and Kubernetes requires you take those changes into account. The controller will just retry the update. The log arguably shouldn't be an error, but it's logged by framework code that treats any failure as an error, even if the failure is relatively benign.

rainest commented 10 months ago

@parkjeongryul We don't want to require class annotations on KongPlugins since they've historically not required them, and it'd be a significant burden to upgrading.

We need to instead have the controller ignore KongPlugins unless there's another resource (that does have class information) similar to how we handle Secrets. There is, however, a snag here, since we also support global plugins without a resource association. Handling both those with the current reconciler generation template would be tricky. I'll see if we want to just disable the new feature for now.

gallolp commented 10 months ago

Hi @rainest, sorry for the delay.

I've upgraded my test install to 2.11 and watched the affected resources. All I see are status updates from the ingress-controller.

It is worth mentioning that I have the old all-in-one setup, each Kong pod has the both the Proxy and the Ingress Controller and I have 3 replicas.

Below is the output of kubectl -n kong-istio get --watch kongconsumer admin-key-auth -o yaml. The updates after the spaces happen after doing a rolling restart of the Kong Deployment.

apiVersion: configuration.konghq.com/v1
credentials:
- admin-api-key
kind: KongConsumer
metadata:
  annotations:
    kubernetes.io/ingress.class: kong
    meta.helm.sh/release-name: kong-istio
    meta.helm.sh/release-namespace: kong-istio
  creationTimestamp: "2022-09-27T15:28:40Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: admin-key-auth
  namespace: kong-istio
  resourceVersion: "138505304"
  uid: a689e998-217b-47be-afe6-1d67ada091bb
status:
  conditions:
  - lastTransitionTime: "2023-08-29T15:41:11Z"
    message: Object was successfully configured in Kong.
    observedGeneration: 1
    reason: Programmed
    status: "True"
    type: Programmed
username: admin-key-auth

---
apiVersion: configuration.konghq.com/v1
credentials:
- admin-api-key
kind: KongConsumer
metadata:
  annotations:
    kubernetes.io/ingress.class: kong
    meta.helm.sh/release-name: kong-istio
    meta.helm.sh/release-namespace: kong-istio
  creationTimestamp: "2022-09-27T15:28:40Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: admin-key-auth
  namespace: kong-istio
  resourceVersion: "138506440"
  uid: a689e998-217b-47be-afe6-1d67ada091bb
status:
  conditions:
  - lastTransitionTime: "2023-08-29T15:44:20Z"
    message: Object is pending configuration in Kong.
    observedGeneration: 1
    reason: Pending
    status: "False"
    type: Programmed
username: admin-key-auth
---
apiVersion: configuration.konghq.com/v1
credentials:
- admin-api-key
kind: KongConsumer
metadata:
  annotations:
    kubernetes.io/ingress.class: kong
    meta.helm.sh/release-name: kong-istio
    meta.helm.sh/release-namespace: kong-istio
  creationTimestamp: "2022-09-27T15:28:40Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: admin-key-auth
  namespace: kong-istio
  resourceVersion: "138506443"
  uid: a689e998-217b-47be-afe6-1d67ada091bb
status:
  conditions:
  - lastTransitionTime: "2023-08-29T15:44:20Z"
    message: Object was successfully configured in Kong.
    observedGeneration: 1
    reason: Programmed
    status: "True"
    type: Programmed
username: admin-key-auth

I've only included a few, they alternate between "pending" and "successfully updated". It never stops and each update increases the resourceVersion.

At the same time the controllers log:

{"error":"Operation cannot be fulfilled on kongconsumers.configuration.konghq.com \"admin-key-auth\": the object has been modified; please apply your changes to the latest version and try again","level":"error","logger":"controllers.KongConsumer","msg":"Reconciler error","reconcileID":"\"f51c11f6-df15-49b8-a76e-19be0f1cfa09\"","time":"2023-08-29T15:55:33Z"}

It's a neven ending loop, between updates on the watch command look at the resourceVersion.

resourceVersion: "138525639"
resourceVersion: "138525648"
resourceVersion: "138525649"
resourceVersion: "138525657"

It happens several times per second.

aresabalo commented 10 months ago

Version 2.11.1 fixed the problem "error msg="Reconciler error" error="Operation cannot be fulfilled on kongplugins.configuration.konghq.com"

But i have the same problem when activated 2 proxy replicas.

time="2023-08-29T17:04:55Z" level=error msg="Reconciler error" error="Operation cannot be fulfilled on kongconsumers.configuration.konghq.com \"consul-backoffice\": the object has been modified; please apply your changes to the latest version and try again" logger=controllers.KongConsumer reconcileID="\"efa6489f-e3ee-4a72-b7e6-86c6a151967e\""

KIC 2.11.1 and GW 3.4.0

Workaround: rollback to single instance and the "reconciler error" disappeared from ingress controller log.

rainest commented 10 months ago

@gallolp and @aresabalo you don't happen to have multiple KIC instances that use the same ingress class, do you? I would expect flapping on consumers if that's the case, but that's not supported: if you want to use multiple controllers, they need to be configured to use different ingress classes.

2.11.1 only makes changes for KongPlugins, since those don't (and shouldn't) have a class filter and instead need a more complex relationship filter.

gallolp commented 10 months ago

@rainest That might be the case but it is unintentional. What I mean to say is this is how the minimal "all-in-one" deployment works. That is each Pod contains both a proxy and an ingress-controller container and each ingress controller processes the same k8s resources (same ingress class) and configures only its local proxy via localhost:port.

I'm talking about the minimal example minimal-kong-controller.yaml. That is same release with Kong DB-Less and KIC. Up to 2.10 this would work great. I have several deployments like this (with more configurations on the value files, but essentially the same). So whenever you scale the Kong deployment either via replicas or autoscaling you will end up with several controllers (one per pod) that will process the same resources. In this scenario there is no communication between controllers, they are independent.

I haven't found anything in the Docs that says that this type of deployment is not supported anymore.

Please let me know if I am missing something.

aresabalo commented 10 months ago

@gallolp and @aresabalo you don't happen to have multiple KIC instances that use the same ingress class, do you? I would expect flapping on consumers if that's the case, but that's not supported: if you want to use multiple controllers, they need to be configured to use different ingress classes.

2.11.1 only makes changes for KongPlugins, since those don't (and shouldn't) have a class filter and instead need a more complex relationship filter.

My deployment is db-less with multiples replicas (pods). Every pod have two containers (proxy and ingress-controller)

`$k get po

NAME READY STATUS RESTARTS AGE kong-794656c99f-xi9yx 2/2 Running 0 15h kong-794656c99f-vh8rv 2/2 Running 0 15h`

pmalek commented 10 months ago

Just to be on the same page: the all in one manifests are provided for end users' ease of use. Their not meant to be used in a production environment. For this you have our helm charts at your disposal where you can customize it to your needs

What I mean to say is this is how the minimal "all-in-one" deployment works. That is each Pod contains both a proxy and an ingress-controller container and each ingress controller processes the same k8s resources (same ingress class) and configures only its local proxy via localhost:port.

This is not the case anymore since 2.9. We've changed the deployment topology in those manifests to have separate Pods for ingress controller (KIC) and Gateway. There's a CHANGELOG entry for this in 2.9 release notes. These utilize Gateway Discovery introduced in 2.9. You can read about it here or here.

So when you deploy e.g. https://github.com/Kong/kubernetes-ingress-controller/blob/v2.11.1/deploy/single/all-in-one-dbless.yaml you will get as mentioned above separate Pods.

If you want to read up more on deploying with helm using Gateway Discovery you can find more information here: https://github.com/Kong/charts/tree/kong-2.26.4/charts/kong#the-gatewaydiscovery-section

There are also exemplar values files for controller https://github.com/Kong/charts/blob/kong-2.26.4/charts/kong/example-values/minimal-kong-gd-controller.yaml and gateway https://github.com/Kong/charts/blob/kong-2.26.4/charts/kong/example-values/minimal-kong-gd-gateway.yaml to help you get started.


Having said that, this does resolve the flapping issue but should help you figure out how to get a working environment with a single controller (which you could also scale out) but scaled out Gateways.

gallolp commented 10 months ago

Hi @pmalek , thanks for your reply. I see now that I have no choice but to move to a split release deployment.

However, I am struggling to understand when it was communicated that the old method is deprecated. The link I provided above is for an example in the Kong Helm Chart https://github.com/Kong/charts/blob/kong-2.26.4/charts/kong/example-values/minimal-kong-controller.yaml, not the "all-in-one" manifest in the controller's repo. I just call the Helm single release deployment "all-in-one", sorry it that caused confusion.

The Doc here says that "Kong Ingress Controller can also be configured to discover deployed Gateways" not that it "must be". And the Kong Helm chart 2.26.4 still allows to enable both the proxy and the ingress-controller on the same release, and up to versions GW 3.3 / KIC 2.10 it works very well. I've been using this method since GW 3.0 / KIC 2.7 with very good results. I would have to test the new split release deployment before moving to it.

I would like to get official confirmation that the only deployment strategy supported is split release proxy and ingress-controller separated with gateway discovery and the single release is deprecated. Maybe the charts should be split as well to avoid mistakes, if this is the case.

Thanks for the help and the patience!

pmalek commented 10 months ago

I just call the Helm single release deployment "all-in-one", sorry it that caused confusion.

No problem 👍

The 1 Pod method is still supported but it has its drawbacks or challenges (like in the above mentioned scenario). Hence the reason we introduced the split Deployment method and Gateway Discovery.

I would like to get official confirmation that the only deployment strategy supported is split release proxy and ingress-controller separated with gateway discovery and the single release is deprecated. Maybe the charts should be split as well to avoid mistakes, if this is the case.

It's not unsupported. It's one of the ways we currently support but bear in mind 1 Pod challenges as discussed above.

We'll be suggesting users to use Gateway Discovery and split deployment but no ETA on removing support for the old way.


As an alternative and something to consider you might want to take a look at https://github.com/Kong/charts/tree/main/charts/ingress chart which is an opinionated way of deploying KIC and Gateway using 1 helm release.

This PR https://github.com/Kong/charts/pull/878 (yet to be merged) contains an exemplar values.yaml file which will deploy both ingress controller and Gateway in 1 helm release.

parkjeongryul commented 10 months ago

I've confirmed that the kongplugin flapping issue has been resolved in 2.11.1.


So when does the kongconsumer flapping issue happen? Does this only happen when there are multiple ingress controller deployments with the same ingressclass?

Even if I don't use gateway discovery (separate proxy and KIC), if I create multiple ingress controllers with differently named ingressclass, is it okay?

gallolp commented 10 months ago

@pmalek Thank you.

I guess it's pretty safe then to use the single release/one pod deployment with 2.10.x while we start working to separate the proxy from the controller.

gallolp commented 10 months ago

@parkjeongryul in my case the KongConsumer status flapping/loop happens when using a single release/one pod deployment (proxy+controller on the same pod). When you scale up the deployment to n > 1 replicas the issue occurs.

parkjeongryul commented 10 months ago

in my case the KongConsumer status flapping/loop happens when using a single release/one pod deployment (proxy+controller on the same pod). When you scale up the deployment to n > 1 replicas the issue occurs.

@gallolp

I deployed two releases of chart. https://github.com/Kong/charts/blob/main/charts/kong/values.yaml. This chart is also deployed as a single pod(KIC + proxy).

Each release has own ingress class and has 30 replicas.

But i can't reproduce kongconsumer flapping. I did kubectl rollout restart deploy, which temporarily changed the kongconsumer status, but not forever.

I'm not sure how to reproduce the forever flapping of the kongconsumer.

gallolp commented 10 months ago

@parkjeongryul I'm not sure what I might be doing differently.

I will have to test on a clean release as soon as I have the time.

Thank you

rainest commented 10 months ago

@gallolp the KongConsumer errors are a related problem that occurs when there are multiple replicas. We'll track that variant in https://github.com/Kong/kubernetes-ingress-controller/issues/4598.