KIC returns self-signed certificate during pod restart

gagarinfan commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

When I call the URL exposed with Ingress during KIC pod rollouts I receive errors regarding self-signed certificates for a moment and then it starts returning a proper one (LetsEncrypt):

*   Trying redacted:443...
* Connected to redacted (redacted) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [326 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [19 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [979 bytes data]
* SSL certificate problem: self signed certificate
* Closing connection 0

In the logs we observed:

2023/07/04 15:39:19 [info] 1109#0: *306 SSL_do_handshake() failed (SSL: error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown:SSL alert number 46) while SSL handshaking, client: 192.168.9.79, server: 0.0.0.0:8443

We're running KIC integrated with AWS NLB on EKS. Runs as Daemonset on all nodes (number of running pods depends on how many nodes are added by autoscaler[karpenter]).

Certificate is not changing during the KIC deployment/rollouts.

Expected Behavior

When I call the any URL exposed with KIC in our cluster I should receive "no route to host" "service unavailable" or preferably traffic should not be directed to the pods that hasn't started properly yet.

Steps To Reproduce

1. Trigger KIC restart (kill one or more pods)
2. Call any of URLs exposed with KIC. For my testing purposes I was running a loop calling some endpoint
3. In case all pods are terminating it returns `Connection refused`. Pretty much understandable
4. But after new ones are created in some of the receive `SSL certificate problem: self signed certificate`

Kong Ingress Controller version

3.0

Kubernetes version

Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.17-eks-c12679a", GitCommit:"d5ce2cee85d99653d6f8c278043213db21b1cd72", GitTreeState:"clean", BuildDate:"2023-05-22T20:32:28Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Anything else?

KIC was installed using the official Helm Chart

values.yaml file


kong:
priorityClassName: "system-cluster-critical"
replicaCount: 1

deployment:
kong:
  enabled: true
daemonset: true
namespace: "infra-system"

# Applying workaround from https://github.com/Kong/kong/issues/9836#issuecomment-1444966054
# to prevent 'address already in use' error when container is restarted
userDefinedVolumes:
  - name: "kong-kong-prefix-dir-tempfs"
    emptyDir:
      medium: "Memory"
      sizeLimit: 256Mi
userDefinedVolumeMounts:
  - name: "kong-kong-prefix-dir-tempfs"
    mountPath: "/kong_prefix_tempfs/"

env:
# "prefix" value must be set to match mountPath configured above
prefix: "/kong_prefix_tempfs/"
router_flavor: traditional
headers: "off"
log_level: "info"
ssl_cipher_suite: "intermediate"
ssl_protocols: "TLSv1.2 TLSv1.3"
nginx_proxy_proxy_ignore_client_abort: "on"

status:
enabled: true
http:
  enabled: true

tls:
  enabled: true

admin:
enabled: true
type: ClusterIP

http:
  enabled: true

tls:
  enabled: true

proxy:
enabled: true
type: LoadBalancer
annotations:
  service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
  service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
labels:
  enable-metrics: "true"

http:
  enabled: true

tls:
  enabled: true

migrations:
preUpgrade: false
postUpgrade: false

ingressController:
enabled: true
ingressClass: external-lb

waitImage:
enabled: false

serviceMonitor:
enabled: true
interval: 30s
labels:
  release: kube-prometheus-stack

tolerations:
- key: "allocation"
  operator: "Equal"
  value: "infra"
  effect: "NoSchedule"
plugins:
configMaps:
  - name: kong-plugin-moesif
    pluginName: moesif

- [Moesif](https://www.moesif.com/) plugin has been installed
- example Ingress object (redacted)
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-name
  annotations:
    konghq.com/https-redirect-status-code: '301'
    konghq.com/protocols: https
spec:
  tls:
    - hosts:
        - name.example.com
      secretName: certificate-wildcard
  rules:
    - host: name.example.com
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: service-name
                port:
                  number: 8000

rainest commented 8 months ago

This predates the change to use the new config available readiness endpoint and was presumably using the old /status endpoint, which would mark Kong containers ready regardless of whether or not they had configuration available. Newer versions of the ingress controller should have prevented Pods from becoming ready in sidecar mode, but it's not clear which version of the controller was used here.

AFAIK this will not happen on current versions (controller 3.0, Kong 3.5, chart 2.33). My test plan was:

Deploy a 15-replica set of Kong instances with a sidecar controller (the Kong readiness endpoint behaves the same way whether the controller is deployed as a sidecar or not, so that shouldn't affect behavior--although separate Deployments for Kong and the controller are now more common, I opted for the topology used for the original report).
Create a basic HTTPS Ingress+Service: tls_ingress.yaml.txt
Confirm I see the expected certificate from the Secret for kong.example and the default self-signed certificate for wrong.example (the latter does not match any configured route/SNI and should serve the fallback certificate).
Start repeatedly sending requests for kong.example to the proxy Service LB IP using curl while logging the presented certificate subject to a file.
Run kubectl rollout restart on the Kong Deployment.
Wait for the new replicas to replace the old.
Stop the repeated curls and checking the unique hostnames observed. All observed hostnames across the restart period were the expected kong.example certificate.

Results did not indicate that any of the requests saw the fallback certificate:

pod_restart_status.txt curl_test.txt

Based on my knowledge of the old and new readiness behaviors, the original report was likely caused by the old behavior bringing Pods into Service before they were truly ready. Testing indicates that the new readiness behavior is behaving as expected and that kube-proxy will not dispatch requests to instances before they've received configuration. I'm closing this as outdated; if you continue to see incorrect certificates served with current versions, please respond back with updated replication steps.

RajaShanmugamJM commented 8 months ago

Faced similar issue, After configuring all the ClusterIssuer and obtaining Certificate, adding the protocol annotation helped me with using right certificate.

https://docs.konghq.com/kubernetes-ingress-controller/3.0.x/reference/annotations/#konghqcomprotocols

apiVersion: v1
kind: Service
metadata:
  name: service-basic-nuxt-app
#  annotations:
#    konghq.com/protocol: https
#    konghq.com/plugins: rate-limit-5-min
spec:
  selector:
    app: basic-nuxt-app
  ports:
    - protocol: TCP
      port: 3000
      targetPort: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ing-basic-nuxt-app
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    konghq.com/preserve-host: "true"
    konghq.com/strip-path: "true"
    konghq.com/protocols: "https"
    # redirects to HTTPS
    konghq.com/https-redirect-status-code: "301"
spec:
  ingressClassName: kong
  tls:
    - secretName: k8-prod-certs
      hosts:
        - example.com
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: ImplementationSpecific
            backend:
              service:
                name: service-basic-nuxt-app
                port:
                  number: 3000

Kong / kubernetes-ingress-controller