Kong / gateway-operator

Kubernetes Operator for Kong Gateways
Apache License 2.0
49 stars 11 forks source link

Dataplane and Controlplane does not become ready #522

Open zsedem opened 3 weeks ago

zsedem commented 3 weeks ago

Current Behavior

We tried to deploy Kong Gateway operator to our test kubernetes cluster, but I am unable to get the HttpRoutes working. Based on these logs, I assume the problem is, because the Gateway does not turn Programmed.

2024-08-24T18:31:53Z debug controllers.HTTPRoute Processing httproute {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "v": 1, "namespace": "test-gateway", "name": "httpbin-post"}
2024-08-24T18:31:53Z debug controllers.HTTPRoute Checking deletion timestamp {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "v": 1, "namespace": "test-gateway", "name": "httpbin-post"}
2024-08-24T18:31:53Z debug controllers.HTTPRoute Retrieving GatewayClass and Gateway for route {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "v": 1, "namespace": "test-gateway", "name": "httpbin-post"}
2024-08-24T18:31:53Z debug controllers.HTTPRoute Listener is not ready {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "parentRef.gateway": "test-gateway/test-gateway", "listener": "http", "v": 1, "reason": "listener not programmed yet"}
2024-08-24T18:31:53Z debug controllers.HTTPRoute Checking if the httproute's gateways are ready {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "v": 1, "namespace": "test-gateway", "name": "httpbin-post"}
2024-08-24T18:31:53Z debug controllers.HTTPRoute Gateway for route was not ready, waiting {"GatewayV1HTTPRoute": {"name":"httpbin-post","namespace":"test-gateway"}, "v": 1, "namespace": "test-gateway", "name": "httpbin-post"}
2024-08-24T18:31:53Z debug Parsing kubernetes objects into data-plane configuration {"v": 1}

Based on some reading in the codes, I found, that the possible reason for that is Dataplane and Controlplane also not becoming ready:

# Control Plane status
---
status:
  conditions:
    - lastTransitionTime: '2024-08-24T18:30:21Z'
      message: There are other conditions that are not yet ready
      observedGeneration: 1
      reason: DependenciesNotReady
      status: 'False'
      type: Ready
    - lastTransitionTime: '2024-08-24T18:30:21Z'
      message: ControlPlane resource is scheduled for provisioning
      reason: PodsNotReady
      status: 'False'
      type: Provisioned
# Data plane status
---
status:
  addresses:
    - sourceType: PrivateIP
      type: IPAddress
      value: x.x.x.x  # Omitted the Ip address
  conditions:
    - lastTransitionTime: '2024-08-24T18:30:21Z'
      message: There are other conditions that are not yet ready
      observedGeneration: 1
      reason: DependenciesNotReady
      status: 'False'
      type: Ready
  readyReplicas: 0
  replicas: 0
  selector: ce900a68-e4b4-4d51-8ecc-88f80f2b6161
  service: dataplane-ingress-test-gateway-qlzd8-km75b

Both the controlplane and the dataplane pods are ready and I can also send request to the dataplane (and get 404, since no httproute is configured)

Expected Behavior

Gateway becomes ready and HttpRoute added to the kong gateway.

Steps To Reproduce

Not sure how to reproduce, because locally in KinD, the same configuration works, so I assume we have some environment related issue. I tried turning off the admissionwebhook, but no success. these are some of the logs, that the operator keeps emitting in this stuck state:

{"level":"debug","ts":"2024-08-24T18:37:27.688Z","logger":"controlplane.deployment_builder","msg":"Resource modified","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-qlzd8","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-qlzd8","reconcileID":"42ec2c8c-d750-4e5f-a9af-8112cc627b9a","namespace":"test-gateway","name":"test-gateway-qlzd8","Deployment":"dataplane-test-gateway-qlzd8-52tpl"}
{"level":"debug","ts":"2024-08-24T18:37:27.717Z","logger":"controlplane","msg":"Resource modified","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"a8d84ee4-d74b-4469-bc06-777588fcb783","namespace":"test-gateway","name":"test-gateway-68gjm","Deployment":"controlplane-test-gateway-68gjm-2pt4t"}
{"level":"debug","ts":"2024-08-24T18:37:54.518Z","logger":"controlplane","msg":"admission webhook disabled, ensuring admission webhook resources are not present","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"ffeadf6a-e212-4be2-880e-3c3b46ded232","namespace":"test-gateway","name":"test-gateway-68gjm"}
{"level":"debug","ts":"2024-08-24T18:37:54.551Z","logger":"controlplane","msg":"Resource modified","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"ffeadf6a-e212-4be2-880e-3c3b46ded232","namespace":"test-gateway","name":"test-gateway-68gjm","Deployment":"controlplane-test-gateway-68gjm-2pt4t"}
{"level":"debug","ts":"2024-08-24T18:37:57.677Z","logger":"controlplane","msg":"admission webhook disabled, ensuring admission webhook resources are not present","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"f663086b-e15b-4ecd-92b3-0cc2193ae59d","namespace":"test-gateway","name":"test-gateway-68gjm"}
{"level":"debug","ts":"2024-08-24T18:37:57.706Z","logger":"controlplane","msg":"Resource modified","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"f663086b-e15b-4ecd-92b3-0cc2193ae59d","namespace":"test-gateway","name":"test-gateway-68gjm","Deployment":"controlplane-test-gateway-68gjm-2pt4t"}
{"level":"debug","ts":"2024-08-24T18:37:57.726Z","logger":"controlplane.deployment_builder","msg":"Resource modified","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-qlzd8","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-qlzd8","reconcileID":"1a359e5d-1f22-44f7-8cc5-fe6c55de7d67","namespace":"test-gateway","name":"test-gateway-qlzd8","Deployment":"dataplane-test-gateway-qlzd8-52tpl"}
{"level":"debug","ts":"2024-08-24T18:38:02.438Z","logger":"controlplane","msg":"admission webhook disabled, ensuring admission webhook resources are not present","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"5c91b86c-6467-4932-9969-2d084a68a51e","namespace":"test-gateway","name":"test-gateway-68gjm"}
{"level":"debug","ts":"2024-08-24T18:38:02.470Z","logger":"controlplane","msg":"Resource modified","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-68gjm","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-68gjm","reconcileID":"5c91b86c-6467-4932-9969-2d084a68a51e","namespace":"test-gateway","name":"test-gateway-68gjm","Deployment":"controlplane-test-gateway-68gjm-2pt4t"}
{"level":"debug","ts":"2024-08-24T18:42:10.252Z","logger":"controlplane","msg":"processing gatewayclass","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass","GatewayClass":{"name":"test-gateway"},"namespace":"","name":"test-gateway","reconcileID":"e448592c-8994-4a0e-8089-132de7a1fc16","namespace":"","name":"test-gateway"}

Any tips how to debug further, which resource should become ready first, the controlplane or the dataplane?

Operator Version

1.3.0

kubectl version

Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.3

zsedem commented 2 weeks ago
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"no Rollout with BlueGreen strategy specified, delegating to DataPlaneReconciler","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"reconciling DataPlane resource","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"validating DataPlane configuration","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"exposing DataPlane deployment admin API via headless service","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"exposing DataPlane deployment via service","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"ensuring mTLS certificate","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"checking readiness of DataPlane service","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"dataplane-ingress-test-gateway-jsmvn-dj2k5"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.555Z","logger":"controlplane","msg":"ensuring DataPlane has service addesses in status","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"dataplane-ingress-test-gateway-jsmvn-dj2k5"}
2024-08-26 11:16:05 {"level":"Level(-2)","ts":"2024-08-26T11:16:05.559Z","logger":"controlplane.deployment_builder","msg":"unexpected type processed for trace logging: string, this is a bug!","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48"}
2024-08-26 11:16:05 {"level":"debug","ts":"2024-08-26T11:16:05.600Z","logger":"controlplane","msg":"Resource modified","controller":"controlplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"ControlPlane","ControlPlane":{"name":"test-gateway-kg5f9","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-kg5f9","reconcileID":"85c2580e-96fa-469c-8a60-9f7b8040c33c","namespace":"test-gateway","name":"test-gateway-kg5f9","Deployment":"controlplane-test-gateway-kg5f9-5cht2"}
2024-08-26 11:16:05 {"level":"debug","ts":"2024-08-26T11:16:05.634Z","logger":"controlplane.deployment_builder","msg":"Resource modified","controller":"dataplane","controllerGroup":"gateway-operator.konghq.com","controllerKind":"DataPlane","DataPlane":{"name":"test-gateway-jsmvn","namespace":"test-gateway"},"namespace":"test-gateway","name":"test-gateway-jsmvn","reconcileID":"59389878-cbf3-410e-9806-59d919e28f48","namespace":"test-gateway","name":"test-gateway-jsmvn","Deployment":"dataplane-test-gateway-jsmvn-vwz4n"}

Turned on the trace level

unexpected type processed for trace logging: string, this is a bug!

Noticed this in the logs

zsedem commented 2 weeks ago

So it turns out the root cause is probably the same as for #500 .

Our company uses mutating webhooks to enforce plenty of best practices in our cluster. Unfortunately if one of these webhooks are patching the deployment created by the Kong Gateway Operator, it will stuck in a loop, where the owner resource (controlplane or dataplane) cannot become ready.

The workaround is, that we create a podTemplateSpec, which is eventually not mutated by any mutating webhooks.

AFAIK generally a Kubernetes Operator like this, should only look at managed fields, when considering the diff.

kubectl get deployment .... -o yaml --show-managed-fields