Kong / charts

Helm chart for Kong
Apache License 2.0
249 stars 480 forks source link

Kong Proxy Service not reachable from outside with Kubernetes AKS 1.24.6 #694

Closed Carmine88 closed 1 year ago

Carmine88 commented 1 year ago

Hello,

I'm working on an AKS cluster, with 1.24.6 version. I installed kong for Kubernetes via the helm chart provided by the doc. The chart installs kong correctly and creates the proxy service, on azure I can see the public IP resource has been created and correctly attached to the load balancer.

NAME                                 TYPE           CLUSTER-IP   EXTERNAL-IP     PORT(S)                                AGE
service/kong-kong-proxy   LoadBalancer   10.0.59.59   20.23.X.X   80:32127/TCP,443:30452/TCP   10m

NAME                                       ENDPOINTS                             AGE
endpoints/kong-kong-proxy   10.244.38.37:8000,10.244.38.37:8443   10m

But when I try to curl the External-IP I get a connection reset.

curl: (56) Recv failure: Connection reset by peer

I'm not sure what I'm missing since I don't see any particular value on the chart in order to let the proxy work.

Also, I tried to install kong via the bitnami repo charts, and in that, I specify the type of service and the IP. For that case, I use an azure public IP resource that is in the same resource group (same cluster, same env) as the one created by kong.

Any tips on that behavior?

Thanks.

p.s: if you need any further infos, let me know, this is my first github issue that I open.

Carmine88 commented 1 year ago

Other information that I forgot to mention:

chart versions: 2.13.1 app version: 3.0

descrive for the service:

Name:                     kong-kong-proxy
Namespace:                kong
Labels:                   app.kubernetes.io/instance=kong
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=kong
                          app.kubernetes.io/version=3.0
                          enable-metrics=true
                          helm.sh/chart=kong-2.13.1
                          helm.toolkit.fluxcd.io/name=kong
                          helm.toolkit.fluxcd.io/namespace=flux-system
Annotations:              meta.helm.sh/release-name: kong
                          meta.helm.sh/release-namespace: kong
Selector:                 app.kubernetes.io/component=app,app.kubernetes.io/instance=kong,app.kubernetes.io/name=kong
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.0.59.59
IPs:                      10.0.59.59
LoadBalancer Ingress:     20.23.X.X
Port:                     kong-proxy  80/TCP
TargetPort:               8000/TCP
NodePort:                 kong-proxy  32127/TCP
Endpoints:                10.244.38.37:8000
Port:                     kong-proxy-tls  443/TCP
TargetPort:               8443/TCP
NodePort:                 kong-proxy-tls  30452/TCP
Endpoints:                10.244.38.37:8443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  45m   service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   45m   service-controller  Ensured load balancer
pmalek commented 1 year ago

Hi @Carmine88 👋

Is the external IP of the load balancer reachable from your network? Are you using any sort of iptables rules that might be blocking the traffic? Can you traceroute this IP? Can you try reaching it from a different machine (e.g. a hosted VPS)?

t3mi commented 1 year ago

@Carmine88 according to AKS release from 2022-09-11 starting with 1.24 there is behavior (breaking) change for published services of type LoadBalancer - protocol used by health probes on the azure load balancer changed from TCP to HTTP(S) with request path set to / by default. So its either of two ways using annotations:

Carmine88 commented 1 year ago

Hello, @pmalek @t3mi Thanks for your feedback, but I did some tests and I found that the helm installation it's somehow broken.

I did these 2 steps in order to install kong on my cluster.

The first time I use the helm repo as per documentation and as I mention in the issue it creates the proxy but is not reachable. These external IP is only reachable from the inside of the cluster, no matter what. I don't have a firewall, etc

But the strange thing is that if I use the kubectl apply of the manifest directly, the proxy works. Doc: https://github.com/Kong/kubernetes-ingress-controller the kubectl apply:

kubectl apply -f https://bit.ly/k4k8s

So following the documentation, If I use the helm repo the proxy is created, and the public IP resource is also created on azure, but is not working (not reachable from the outside), but If I just apply directly the manifest as per doc, it works just like a charm.

And in order to have a working kong, in the end I used the manifest directly.

pmalek commented 1 year ago

Alright, so since you're using AKS it might have something to do with service annotations as @t3mi mentioned.

The default all in one manifest has the following annotations on the LoadBalancer Service:

  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

whereas the chart does not specify any annotations by default https://github.com/Kong/charts/blob/main/charts/kong/values.yaml#L257-L269

You can try either adding what's defined in the all in one manifest to helm chart values (via proxy.annotations or manipulating them to your needs.

Let me know if it helps.

Also: if you're still experiencing some problems please attach the output of kubectl describe of said Service.

objt-ev commented 1 year ago

Hello, @pmalek @t3mi Thanks for your feedback, but I did some tests and I found that the helm installation it's somehow broken.

I did these 2 steps in order to install kong on my cluster.

The first time I use the helm repo as per documentation and as I mention in the issue it creates the proxy but is not reachable. These external IP is only reachable from the inside of the cluster, no matter what. I don't have a firewall, etc

But the strange thing is that if I use the kubectl apply of the manifest directly, the proxy works. Doc: https://github.com/Kong/kubernetes-ingress-controller the kubectl apply:

kubectl apply -f https://bit.ly/k4k8s

So following the documentation, If I use the helm repo the proxy is created, and the public IP resource is also created on azure, but is not working (not reachable from the outside), but If I just apply directly the manifest as per doc, it works just like a charm.

And in order to have a working kong, in the end I used the manifest directly.

I have tried adding the annotations manually in the loadbalancer service but that did not help for me... When comparing the helm chart template output with the manifest file, there are more changes than only the service annotiations... So the only real solution for now was using the manifest directly.

Carmine88 commented 1 year ago

Hello, @pmalek @t3mi Thanks for your feedback, but I did some tests and I found that the helm installation it's somehow broken. I did these 2 steps in order to install kong on my cluster. The first time I use the helm repo as per documentation and as I mention in the issue it creates the proxy but is not reachable. These external IP is only reachable from the inside of the cluster, no matter what. I don't have a firewall, etc But the strange thing is that if I use the kubectl apply of the manifest directly, the proxy works. Doc: https://github.com/Kong/kubernetes-ingress-controller the kubectl apply: kubectl apply -f https://bit.ly/k4k8s So following the documentation, If I use the helm repo the proxy is created, and the public IP resource is also created on azure, but is not working (not reachable from the outside), but If I just apply directly the manifest as per doc, it works just like a charm. And in order to have a working kong, in the end I used the manifest directly.

I have tried adding the annotations manually in the loadbalancer service but that did not help for me... When comparing the helm chart template output with the manifest file, there are more changes than only the service annotiations... So the only real solution for now was using the manifest directly.

Exactly, I also try to add manually the annotation but id didn't work. So in the end I use the manifest directly

pmalek commented 1 year ago

Hi @Carmine88 @objt-ev ,

Have you tried comparing the Service annotations and other metadata as it's applied to the cluster from the manifest and from the helm chart? Perhaps there are some small differences that I might have missed (apart from the what was linked above).

objt-ev commented 1 year ago

Hi @pmalek

!! FOUND THE SOLUTION !!

Using the following service annotations on AKS did not work, simply because they are for AWS and not AKS !! (did not notice that earlier)

 annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

So I tried to find out why the 'helm chart' service reacted different than the working 'kubectl apply'. I looked at the loadbalancer in the azure portal and noticed the following differences in the balancer health probes

So the helm chart creates a loadbalancer service with HTTP probes.... and the kubectl version creates on with TCP probes... I then took a closer look at the yaml differences and finally found it !!

spec:
  type: LoadBalancer
  ports:
  - name: kong-proxy
    port: 80
    targetPort: 8000
    **appProtocol: http**   <-- this causes the creation of a HTTP probe !!
    protocol: TCP
  - name: kong-proxy-tls
    port: 443
    targetPort: 8443
    **appProtocol: https** <-- this causes the creation of a HTTPS probe !!
    protocol: TCP 

I have created a service manually via a yaml file without the appProtocol values in it and.... it worked!! So: to be compatible with AKS 1.24 you will need to remove the appProcotol values in the service helm chart template.

Carmine88 commented 1 year ago

Hi @pmalek

!! FOUND THE SOLUTION !!

Using the following service annotations on AKS did not work, simply because they are for AWS and not AKS !! (did not notice that earlier)

 annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

So I tried to find out why the 'helm chart' service reacted different than the working 'kubectl apply'. I looked at the loadbalancer in the azure portal and noticed the following differences in the balancer health probes

  • for the kubectl version, the health probes where all TCP based probes
  • for the helm chart version, the health probes where HTTP/HTTPS probes targetted at '/' As Kong itself does not have an standard health probe at '/', the loadbalancer health check always fails. this causes the public ip not to be accessible !!

So the helm chart creates a loadbalancer service with HTTP probes.... and the kubectl version creates on with TCP probes... I then took a closer look at the yaml differences and finally found it !!

spec:
  type: LoadBalancer
  ports:
  - name: kong-proxy
    port: 80
    targetPort: 8000
    **appProtocol: http**   <-- this causes the creation of a HTTP probe !!
    protocol: TCP
  - name: kong-proxy-tls
    port: 443
    targetPort: 8443
    **appProtocol: https** <-- this causes the creation of a HTTPS probe !!
    protocol: TCP 

I have created a service manually via a yaml file without the appProtocol values in it and.... it worked!! So: to be compatible with AKS 1.24 you will need to remove the appProcotol values in the service helm chart template.

Great, I acutally used the azure annotation with no luck, but as you spotted, the problem is on the service template ;) Thanks for your work

rainest commented 1 year ago

x-post from a PR: https://github.com/Kong/charts/pull/705#issuecomment-1370210829

Azure should now support new annotations that override its probe behavior, and those should make it not fail checks on listens that do not return 200 on GET / by forcing the checks to one of the services we actually intend for liveness checks.

skoczko commented 1 year ago

@rainest so what are the annotations and its values for AKS to redirect the checks to a proper endpoint? Is this something that the the chart can configure by default?

vedsmand commented 1 year ago

I came across the exact same issue as described. I was able to create a workaround using fluxcd helm postrenderer: https://fluxcd.io/flux/components/helm/helmreleases/#post-renderers

workaround snippet

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: kong
spec:
  releaseName: kong
  chart:
    spec:
      chart: kong
      sourceRef:
        kind: HelmRepository
        name: kong
        namespace: kong
      version: 2.14.0
  interval: 1h0m0s
  install:
    remediation:
      retries: 3
  # Default values
  # https://github.com/Kong/charts/blob/main/charts/kong/values.yaml
  values:
    env:
      database: "off"

    proxy:
      enabled: true
      type: LoadBalancer
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-resource-group: <my-rg>
      loadBalancerIP: <my-ip>

    ingressController:
      enabled: true

  postRenderers:
    # Instruct helm-controller to use built-in "kustomize" post renderer.
    - kustomize:
        patchesJson6902:
          - target:
              version: v1
              kind: Service
              name: kong-kong-proxy
              namespace: kong
            patch:
              - op: replace
                path: /spec/ports/0/appProtocol
                value: TCP
              - op: replace
                path: /spec/ports/1/appProtocol
                value: TCP
skoczko commented 1 year ago

@vedsmand Thanks for this, it does the job nicely. I tried playing with annotations mentioned in the other thread but I couldn't get them to work (nor could I setup the probes that work through Azure portal). Here's what I tried:

service.beta.kubernetes.io/port_80_health-probe_protocol: "http"
service.beta.kubernetes.io/port_80_health-probe_port: "8100"
service.beta.kubernetes.io/port_80_health-probe_request-path: "/status"
service.beta.kubernetes.io/port_443_health-probe_protocol: "http"
service.beta.kubernetes.io/port_443_health-probe_port: "8543"
service.beta.kubernetes.io/port_443_health-probe_request-path: "/status"

But this break LB provisioning on AKS and the service gets stuck in <pending> state.

jkuma145 commented 1 year ago

Hi Can you check the load balancer health probe configuration corresponding to nginx-ingress-controller service IP. If the health probe is configured with protocol “HTTP” and path “/” the nginx-ingress-controller service will most likely fail the load balancer probes. Check the nginx ingress controller service describe output. If the "service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path=/healthz" annotation is missing the LB health probes will possibly fail.

mloskot commented 1 year ago

I'm on Kubernetes AKS 1.25.5 with kong/kong 3.0, kong/kubernetes-ingress-controller 2.7 and Kong Helm chart 2.13.1 then upgraded to 2.14.0, and I've hit an issue with unreachable Kong, and after lengthy investigation I suspect it is related to the problem reported (and solved) here, so I'd like to archive it for others who may be searching the web (e.g. for "Load Balancer Agent" phrase, see below).

I'm not sure when exactly, my Kong Proxy pod started logging these strange probes:

$ kubectl logs -n kong kong-kong-5cdbdc6c89-bd55m -c proxy
...
127.0.0.1 - - [27/Mar/2023:21:10:25 +0000] "GET / HTTP/2.0" 200 12299 "-" "Go-http-client/2.0"
127.0.0.1 - - [27/Mar/2023:21:10:25 +0000] "GET /tags HTTP/2.0" 200 23 "-" "Go-http-client/2.0"
127.0.0.1 - - [27/Mar/2023:21:10:25 +0000] "GET / HTTP/2.0" 200 12299 "-" "Go-http-client/2.0"
127.0.0.1 - - [27/Mar/2023:21:10:25 +0000] "GET / HTTP/2.0" 200 12299 "-" "Go-http-client/2.0"
10.0.0.4 - - [27/Mar/2023:21:10:31 +0000] "GET / HTTP/1.1" 404 48 "-" "Load Balancer Agent"
10.0.16.4 - - [27/Mar/2023:21:10:31 +0000] "GET / HTTP/1.1" 404 48 "-" "Load Balancer Agent"
10.0.16.64 - - [27/Mar/2023:21:10:31 +0000] "GET / HTTP/1.1" 404 48 "-" "Load Balancer Agent"
10.0.16.124 - - [27/Mar/2023:21:10:31 +0000] "GET / HTTP/1.1" 404 48 "-" "Load Balancer Agent"

which apparently come from Kubernetes AKS internal service probing for some metrics, i.e. npm-metrics-cluster-service:

$ kubectl get -A endpoints | grep -E "10.0.16.[1-6]+4"
kube-system               npm-metrics-cluster-service         10.0.0.4:10091,10.0.16.124:10091,10.0.16.64:10091     203d

Upgrading Kong Helm chart to 2.5.13 specifically fixes the problem like a charm, and I see no more 404 Load Balancer Agent" in the proxy log.

Within a few minutes after reconciliation of upgraded Kong completed, I see only these 404-s in the Kong Proxy pod log:

logs -n kong kong-kong-56df8dbdbb-8kccv -c proxy | grep " 404"
10.0.16.64 - - [27/Mar/2023:21:41:47 +0000] "GET / HTTP/1.1" 404 48 "-" "curl/7.87.0"
10.0.0.4 - - [27/Mar/2023:21:42:12 +0000] "GET / HTTP/1.1" 404 48 "-" "curl/7.87.0"
10.0.0.4 - - [27/Mar/2023:21:43:24 +0000] "POST /boaform/admin/formLogin HTTP/1.1" 404 48 "http://51.142.214.164:80/admin/login.asp" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0"
10.0.0.4 - - [27/Mar/2023:21:46:50 +0000] "GET / HTTP/1.1" 404 48 "-" "curl/7.87.0"
10.0.16.64 - - [27/Mar/2023:21:47:09 +0000] "GET /abc HTTP/1.1" 404 48 "-" "curl/7.87.0"
10.0.0.4 - - [27/Mar/2023:21:47:29 +0000] "GET / HTTP/1.1" 404 48 "-" "HTTPie/3.2.1"
10.0.16.124 - - [27/Mar/2023:21:47:41 +0000] "GET / HTTP/1.1" 404 48 "-" "curl/7.87.0"
10.0.16.64 - - [27/Mar/2023:21:47:50 +0000] "GET /owa/auth/logon.aspx?url=https%3a%2f%2f1%2fecp%2f HTTP/1.1" 404 48 "-" "Mozilla/5.0 zgrab/0.x"
10.0.16.64 - - [27/Mar/2023:21:49:18 +0000] "GET /?XDEBUG_SESSION_START=phpstorm HTTP/1.1" 404 48 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
10.0.16.64 - - [27/Mar/2023:21:54:59 +0000] "GET / HTTP/1.1" 404 48 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36 Edg/90.0.818.46"

As a double-check, I downgraded the chart back to 2.14.0 and this problem re-appeared again, then I upgraded back to 2.15.3 and the problem is gone.

Related or not, I'd like to thank everyone for the discussion here and @R3DRUN3 for mentioning this issue, so I was able to find https://github.com/krateoplatformops/krateo-module-core/issues/2#issuecomment-1413861364

mattiadevivo commented 1 year ago

@mloskot thank you very much, I had the same problem with AKS v1.26.5, and kong helm chart v2.8.0 and by upgrading the load balancer now works correctly.