Sending requests to services using kong proxy internal service address results in invalid protocol error

LeadingMoominExpert commented 3 years ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I'm running a kong ingress controller in a kubernetes cluster on openstack, where kic is installed in it's own kong namespace and applications are deployed to a namespace services. I have configured ingresses for these apps in services namespace, and the external connections are fine. However when trying to communicate between applications through kong proxy service address and the exposed port 443, the requests fail and result in an invalid protocol error.

Expected Behavior

I'm expecting the applications to respond as intended.

Steps To Reproduce

Install kong ingress controller using helm3 with

helm upgrade king kong/kong --install --namespace kong --values values.yaml

Where values.yaml contains

image:
  repository: revomatico/docker-kong-oidc
  tag: 2.4.1-1

proxy:
  annotations:
    loadbalancer.openstack.org/floating-subnet: redacted
    loadbalancer.openstack.org/proxy-protocol: true
  externalTrafficPolicy: Local

replicaCount: 2

podDisruptionBudget:
  enabled: true
  maxUnavailable: "50%"

env:
  log_level: info
  nginx_proxy_large_client_header_buffers: "16 128k"
  proxy_buffer_size: "128k"
  anonymous_reports: off
  nginx_http_log_format: redacted
  nginx_http_lua_ssl_trusted_certificate: /etc/ssl/certs/ca-certificates.crt
  proxy_access_log: /dev/stdout laas
  proxy_listen: "0.0.0.0:8000 proxy_protocol, 0.0.0.0:8443 ssl proxy_protocol"
  real_ip_header: proxy_protocol
  trusted_ips: "0.0.0.0/0,::/0"
  x_session_compressor: zlib
  x_session_name: "oidc_session"
  nginx_proxy_proxy_busy_buffers_size: "256k"
  nginx_proxy_proxy_buffers: "16 128k"
  plugins: bundled,oidc
  ssl_cert: /etc/secrets/default-tls/tls.crt
  ssl_cert_key: /etc/secrets/default-tls/tls.key
  x_session_secret:
    valueFrom:
      secretKeyRef:
        name: kong-session-secret
        key: session-secret

ingressController:
  env:
    anonymous_reports: false
  installCRDs: false
  resources:
    requests:
      cpu: "200m"
      memory: "0.25Gi"
    limits:
      cpu: "500m"
      memory: "0.5Gi"

secretVolumes:
- default-tls

resources:
  requests:
    cpu: "200m"
    memory: "0.25Gi"
  limits:
    cpu: "500m"
    memory: "0.5Gi"

securityContext:
  runAsUser: 100
  fsGroup: 100

serviceMonitor:
  enabled: true

where the default-tls secret is a certificate provided by the kubernetes platform for respective dns available. The referenced kong-session-secret is created with

kubectl create secret generic kong-session-secret --namespace kong --from-literal=session-secret=$(openssl rand -base64 30)

For my application I would create ingresses as

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-ingress
  namespace: services
  annotations:
    konghq.com/strip-path: "true"
    konghq.com/protocols: https
    konghq.com/https-redirect-status-code: "301"
spec:
  ingressClassName: kong
  tls:
  - hosts:
    - example.com
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-app
            port:
              number: 8080

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: service-ingress
  namespace: services
  annotations:
    konghq.com/strip-path: "true"
    konghq.com/protocols: https
    konghq.com/https-redirect-status-code: "301"
    konghq.com/plugins: key-auth, basic-auth, services-acl
spec:
  ingressClassName: kong
  tls:
  - hosts:
    - example.com
  rules:
  - host: example.com
    http:
      paths:
      - path: /app-service
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 8080

Then try to call the app-service from the frontend application by using the created king-kong-proxy loadbalancer service. So the service address would be

king-kong-proxy.kong.svc.cluster.local:443/app-service

which results in invalid protocol. Same happened with the exposed http port :80. The plugins defined in the ingresses are custom kong-plugins implementing auth & acl's.

Kong Ingress Controller version

1.3

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Anything else?

No response

shaneutt commented 3 years ago

@LeadingMoominExpert thanks for the report.

I built a local laboratory (using KIND) to try and reproduce this issue, but I wasn't able to so far in that environment. In that environment the calls I made from the first application they were done manually with curl to the other application through the Kong Service (in my case it was ingress-controller-kong-proxy.kong-system.svc.cluster.local:80) and didn't encounter any errors.

I was thinking it might be helpful if you could provide some more details about how the client is calling the backend (for instance, in my case I was using curl), can you please provide:

what kind of HTTP client is making the calls and anything specific to its configuration
trigger a failure where the client has verbose logging enabled and capture the logs

Having a bit more information on how the request is originally made, and as much verbose output about the failure could prove to be helpful here. Ideally if you can reproduce the problem with curl (specifically curl -vvv) that could be particularly helpful since I am familiar with curl.

LeadingMoominExpert commented 3 years ago

@shaneutt thanks for investigating the issue.

I'll try to find time soon to replicate the scenario I was having and try the approach with curl too, as you suggested. The original situation was a very basic node/express application without any special configurations sending requests to the kong proxy.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

LeadingMoominExpert commented 3 years ago

I tried investigating the issue further by setting up a debug pod with curl available per your suggestion @shaneutt . So using the same configuration as in the initial post and with curl

$ curl -vvv -GET king-kong-proxy.kong.svc.cluster.local:80
*   Trying 100.97.75.146:80...
* Connected to king-kong-proxy.kong.svc.cluster.local (100.97.75.146) port 80 (#0)
> GET / HTTP/1.1
> Host: king-kong-proxy.kong.svc.cluster.local
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server

and in the kong proxy logs I have

2021/10/20 13:27:36 [error] 25#0: *6355 broken header: "GET / HTTP/1.1
Host: king-kong-proxy.kong.svc.cluster.local
User-Agent: curl/7.79.1
Accept: */*

" while reading PROXY protocol, client: 100.117.214.141, server: 0.0.0.0:8000

Is the proxy possibly lacking some configurations on how to manage headers? I'm having a hard time wrapping my head around this problem.

shaneutt commented 3 years ago

Nothing terribly obvious seems to stand out in this case... :thinking:

One thing you could try is disabling proxy_protocol on the listens, and if that fixes the problem check and see if your LB is somehow not configured to send it properly (and fix that).

If that's not the problem check this troubleshooting guide out to actually scrape the network traffic and inspect it:

https://docs.konghq.com/kubernetes-ingress-controller/2.0.x/troubleshooting/#inspecting-network-traffic-with-a-tcpdump-sidecar

Possibly unrelated: you'll want to remove the proxy_listen env config entirely and use this instead:

https://github.com/Kong/charts/blob/main/charts/kong/values.yaml#L217-L238

I wouldn't really expect doing so to break anything since it uses the standard ports, but it's possible this could cause something in the templates to not match and deliver to the wrong port :thinking:

LeadingMoominExpert commented 3 years ago

Ran into a wall trying to scrape the traffic with the tcpdump sidecar, as running privileged pods in my kubernetes environment is not too simple. I can work around it but just need more time than I have right now.

As far as removing proxy_protocol on the listens, the proxy logs are flooded with

100.68.0.83 - - [21/Oct/2021:17:50:48 +0000] "PROXY TCP4 100.68.0.83 100.76.2.138 1811 31606" 400 0 "-" "-"
2021/10/21 17:51:17 [info] 24#0: *323 client sent invalid request while reading client request line, client: 100.68.3.7, server: kong, request: "PROXY TCP4 100.68.3.7 100.76.2.138 6127 31326"
2021/10/21 17:51:17 [info] 24#0: *323 writev() failed (104: Connection reset by peer), client: 100.68.3.7, server: kong, request: "PROXY TCP4 100.68.3.7 100.76.2.138 6127 31326"

When running curl again I get

$ curl -vvv -GET king-kong-proxy.kong.svc.cluster.local:80
*   Trying 100.97.75.146:80...
* Connected to king-kong-proxy.kong.svc.cluster.local (100.97.75.146) port 80 (#0)
> GET / HTTP/1.1
> Host: king-kong-proxy.kong.svc.cluster.local
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Thu, 21 Oct 2021 17:53:12 GMT
< Content-Type: application/json; charset=utf-8
< Connection: keep-alive
< Content-Length: 48
< X-Kong-Response-Latency: 1
< Server: kong/2.4.1
< 
* Connection #0 to host king-kong-proxy.kong.svc.cluster.local left intact
{"message":"no Route matched with those values"}

which looks a little more promising, though no routes match no matter what routes I try that would work using a dns instead of the internal proxy service address. The corresponding log entry on the proxy for this curl is

100.117.214.152 - - [21/Oct/2021:17:54:56 +0000] "GET / HTTP/1.1" 404 48 "-" "curl/7.79.1"

The end result was similar if I removed the proxy_listen env config and used the helm chart definition instead.

After dropping the proxy_protocol on the listens and ending up with no matching routes I feel like I'm lacking some configurations on how to proxy traffix to the services, could it be an issue that the ingresses are lacking something?

shaneutt commented 3 years ago

I'm not quite sure yet :thinking:

Do you think you might be able to put together a reproduction environment for this? Ideally if we could reproduce this on something like kind or minikube... but I know you're on OpenStack so even a something that I can spin up on an OpenStack trial cluster, or with devstack? Was this OpenShift on top of OpenStack, or a different distro, or custom?

LeadingMoominExpert commented 3 years ago

Reproducing the environment could be tricky. It's a private cloud solution where the kubernetes clusters are provisioned on OpenStack using kops, and I'm not fully familiar with the OpenStack side of things. But spinning up a cluster on devstack should be possible. The cluster itself is based on the community managed kubernetes version and not anything custom.

shaneutt commented 3 years ago

Reproducing the environment could be tricky. It's a private cloud solution where the kubernetes clusters are provisioned on OpenStack using kops, and I'm not fully familiar with the OpenStack side of things. But spinning up a cluster on devstack should be possible. The cluster itself is based on the community managed kubernetes version and not anything custom.

Are you expecting that if I were to spin up an openstack + kops based cluster with otherwise default configurations the problem will trigger? Would you be able to provide any additional guidance in the form of configurations, scripts, e.t.c. to ensure for a more accurate reproduction environment?

LeadingMoominExpert commented 3 years ago

The clusters I'm using didn't come fully naked, there was some monitoring/logging related stuff preinstalled like prometheus operator and whatnot. There was a nginx ingress controller too, which was deleted before starting out with kong. Nothing that comes to mind that would affect the kong ingress controller exists there anymore though. So in my mind creating an openstack and kops based cluster could work in reproducing the problem. What additional guidance would you need? I'll do my best to provide anything required.

shaneutt commented 3 years ago

Alright so the reproduction steps are:

deploy a kops cluster to OpenStack (default configurations)
deploy Kong Kubernetes Ingress Controller v1.3.x (default configurations)
deploy any app and expose it via Kong Ingress
expect failures trying to communicate from pods inside the cluster to Kong via the Service address

If there anything else you think of that might be relevant let me know.

LeadingMoominExpert commented 3 years ago

Alright so the reproduction steps are:

1. deploy a `kops` cluster to OpenStack (default configurations)

2. deploy Kong Kubernetes Ingress Controller `v1.3.x` (default configurations)

3. deploy any app and expose it via Kong Ingress

4. expect failures trying to communicate from pods inside the cluster to Kong via the `Service` address

If there anything else you think of that might be relevant let me know.

On 2. the KIC configurations would be per the original issue above. Looks good otherwise.

LeadingMoominExpert commented 3 years ago

Possibly related and hopefully insightful: There was a bug on openstack-cloud-provider similar to my problem https://github.com/kubernetes/cloud-provider-openstack/issues/1287 And a related KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1860-kube-proxy-IP-node-binding So it might be an issue on openstack octavia LB and not the ingress controller itself 🤔 however maybe some kinda workaround would be possible while waiting for enhancements to kubernetes upstream

shaneutt commented 3 years ago

Possibly related and hopefully insightful: There was a bug on openstack-cloud-provider similar to my problem kubernetes/cloud-provider-openstack#1287

Are you able to verify whether using a different implementation other than Octavia otherwise deployed on the same cluster and infrastructure doesn't have the same problem?

And a related KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1860-kube-proxy-IP-node-binding So it might be an issue on openstack octavia LB and not the ingress controller itself

If this is the root cause then you may want to consider building and testing a custom Kubernetes build including any relevant (but not yet released) improvements to see if this alleviates your issue.

however maybe some kinda workaround would be possible while waiting for enhancements to kubernetes upstream

Seems like it would be perfectly reasonable to start asking around in https://github.com/kubernetes/cloud-provider-openstack/issues/1287 to see if there are some workaround suggestions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Kong / kubernetes-ingress-controller