Kong / charts

Helm chart for Kong
Apache License 2.0
237 stars 473 forks source link

not able to run Kong Gateway Operator (KGO) #1068

Closed joran-fonjallaz closed 1 month ago

joran-fonjallaz commented 1 month ago

following the official doc, KGO remains in a broken state, where the controller-manager fails with an error, and the controlplane and dataplane deployment get ready.

Steps to reproduce. Create a new GKE cluster. No special config.

  1. copy-paste the commands from Install KIC with Kong Gateway Operator
  2. copy-paste the commands from Create a GatewayClass
  3. copy-paste the commands from Create a Route

the gateway-operator (container manager) throw a few errors such as

"Internal error occurred: failed calling webhook "gateway-operator-validation.konghq.com": failed to call webhook: Post "https://gateway-operator-validating-webhook.kong-system.svc:443/validate?timeout=5s": no endpoints available for service "gateway-operator-validating-webhook""

with the stack trace

"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227"

or such messages

level: "info"
logger: "controlplane"
msg: "no ingress services found for dataplane"
name: "kong"
namespace: "default"

the dataplane never becomes ready with Readiness probe failed: HTTP probe failed with statuscode: 503

and similarly for the controlplane with Readiness probe failed: HTTP probe failed with statuscode: 404.

I've tried also playing with the values.yaml, but after a lot of hours, I resolve to asking for help. I don't manage to even make it work by following you doc, and on a fresh GKE cluster.

What am I missing ? Any pointer would be greatly appreciated ! Many thanks

joran-fonjallaz commented 1 month ago

providing more info.

Controlplane

the response on the controlplane's readiness probe at :10254/readyz is a 404 page not found. The container logs show that the contolplane fails to reach the dataplane on the admin API on port 8444

2024-05-19T07:28:04Z    info    setup   Retrying kong admin api client call after error {"v": 0, "retries": "50/60", "error": "making HTTP request: Get \"https://10-52-2-37.dataplane-admin-kong-wgz7n-mpvzh.default.svc:8444/\": dial tcp: lookup 10-52-2-37.dataplane-admin-kong-wgz7n-mpvzh.default.svc on 10.17.176.10:53: no such host"}

the format of the URI looks wrong https://10-52-2-37.dataplane-admin-kong-wgz7n-mpvzh.default.svc:8444, but maybe it's only the log format. I didn't dive into the operator's source code.

Dataplane

the response on the dataplane's readiness probe at :8100/status/ready is a 503 Service Temporarily Unavailable with body {"message":"no configuration available (empty configuration present)"}

which makes sense since the controlplane cannot reach the dataplane on its admin port, and thus configure it.

pmalek commented 1 month ago

Hi @joran-fonjallaz,

Due to limitations in kube-dns (which is used by default on GKE) the Admin API endpoints are unreachable using the service scoped dns names which is what ControlPlane (KIC) uses by default (as defined by --gateway-discovery-dns-strategy).

We have 2 issues tracking this https://github.com/Kong/gateway-operator/issues/179 and https://github.com/Kong/gateway-operator/issues/140 and a workaround which uses coredns instead: https://github.com/Kong/gateway-operator/issues/179#issuecomment-2071905771.

joran-fonjallaz commented 1 month ago

thank you @pmalek for getting back to me regarding this issue. Switching to CoreDNS is not an option for me. Do you have any idea if this issue will get solved for the native kube-dns ? If yes, any ticket I can track ?

pmalek commented 1 month ago

I don't believe this is going to change for kube-dns anytime soon. I've created https://github.com/kubernetes/dns/issues/633 to track this feature request.

In the meantime I'm going to close this issue as it's already tracked under https://github.com/Kong/gateway-operator/issues/179 and https://github.com/Kong/gateway-operator/issues/140.

joran-fonjallaz commented 1 month ago

thanks again @pmalek ! So do I understand correctly, you mean that kong as no plan to support GKE for the gateway-operator any time soon ?

pmalek commented 1 month ago

We do want to support GKE but as of now the only option is to use coredns instead of kube-dns.

When https://github.com/Kong/gateway-operator/issues/179 gets resolved we'll have a solution for GKE without the mentioned workaround.