Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.2k stars 590 forks source link

GD breaks mesh because of double TLS #4698

Closed mflendrich closed 11 months ago

mflendrich commented 1 year ago

Is there an existing issue for this?

Current Behavior

When gateway discovery (#702) is enabled, some meshes (at least Kuma) are known to be broken because the Helm chart sets the protocol to TLS.

Suggestion: add protocol customizability - e.g. allow the user to offload mTLS between KIC and Admin API to the mesh running in the cluster.

Expected Behavior

Mesh not broken with GD

Steps To Reproduce

kumactl install control-plane | kubectl apply -f -
kubectl apply --server-side -f - <<EOF
apiVersion: kuma.io/v1alpha1
kind: Mesh
metadata:
  name: default
spec:
  mtls:
    enabledBackend: ca-1
    backends:
      - name: ca-1
        type: builtin
        dpCert:
          rotation:
            expiration: 1d
        conf:
          caCert:
            RSAbits: 2048
            expiration: 10y
  networking:
    outbound:
      passthrough: false
EOF
helm upgrade --install --create-namespace -n kong kong kong/ingress
kubectl label ns kong kuma.io/gateway=enabled
kubectl label ns kong kuma.io/sidecar-injection=enabled
kubectl delete pods -lapp=kong-gateway -n kong --wait=false
kubectl delete pods -lapp=kong-controller -n kong --wait=false

Kong Ingress Controller version

2.11 (speculative)

Kubernetes version

all

Anything else?

No response

rainest commented 11 months ago

For offloading the role currently handled by the admin API certificate requirements, the x-forwarded-client-cert header is roughly in the same area when mTLS is enabled. It is handled at the Envoy level (via forward_client_cert_details and set_current_client_cert_details configuration).

https://github.com/kumahq/kuma/issues/1256 has mixed information about official support for this, though there's some indication that it can be enabled in source. I don't know yet whether/when that flag is enabled.

This information isn't quite what we want, however. It is not signed with anything we can check, and even ignoring that there's no ready way to limit admin API access using it.

mTLS docs indicate it is using SPIFFE, which suggests it'd be possible to forward the SVID. This could potentially be possible to verify using configuration similar to what we have now, though it's unclear if this is indeed used as a client certificate. The JWT variant should be possible to validate by placing the admin API behind the proxy with the JWT plugin.

Alternately, documentation indicates we can just exclude the admin API from the mesh. The traffic.kuma.io/exclude-inbound-ports should let us use the current configuration. This does preclude usage of any other mesh features for the admin API, but this may be okay, since the admin API isn't intended for wide consumption--DB-less requires that only the controller be able to use write operations. Istio has a similar traffic.sidecar.istio.io/excludeInboundPorts configuration.

pmalek commented 11 months ago

Added the reproduction steps to the description.

pmalek commented 11 months ago

I've verified that adding traffic.kuma.io/exclude-outbound-ports: "8444" to default pod annotation set does make KIC reach Admin API.

rainest commented 11 months ago

It feels like this should require excluding the inbound ports on the gateway side also, but it doesn't. Unsure why--it feels like interception on either side should introduce an additional TLS hop and break the path.

Ideally we would set this annotation automatically, but the feature design and umbrella chart don't lend themselves to this.

The annotation requires a number port value because it isn't Service-aware and just sees the connection characteristics as they enter or exit containers. The controller code lacks an explicit port number configuration, as it looks up the Service and determines the appropriate port by name. We can't infer the port number from the controller configuration.

The gateway configuration knows its admin API port from admin.tls.containerPort, but that value is not available to the controller subchart. It would be available within a single chart. Global values can work around this, but would require configuring the admin API settings outside their normal location and support for global overrides in the Kong template.

That said, AFAIK there shouldn't normally be any reason to change the admin containerPort from the default, so I wouldn't expect many configurations to need to update the annotation value.

Adding podAnnotations to the ingress chart (so that those who do change the port, or need to exempt other controller ports, can change it) does mean that we need to keep values in sync with the kong chart podAnnotations. The values won't merge and are treated as a complete set--if we forget to copy an annotation in kong over to ingress, it won't show up in ingress installs.

rainest commented 11 months ago

Closing with the merger of https://github.com/Kong/kubernetes-ingress-controller/pull/4942 and https://github.com/Kong/charts/pull/913. Caveat that these do not offload validation to the mesh but instead exempt it. This seems prudent between (a) not knowing behavior across all meshes when enabled, but knowing that they do generally have exclude directives and (b) the broadcast behavior of DB-less discovery not following typical Kubernetes traffic patterns, with the expectation that other mesh policies may interfere with that also.

rainest commented 9 months ago

To revisit this a bit in light of #5251, the repro steps in the OP are a bit off and don't actually result in a connection failure. They don't enable mTLS (it's not on by default in kong/ingress) and server-side TLS HTTPS is compatible with the mesh shown. mTLS is not, but it can be disabled independent of the discovery protocol.


Install flow for basic HTTPS is the same as OP. Output is lost due to me forgetting to record it, but it's not particularly interesting.

$ helm upgrade --install --set "controlPlane.mode=standalone" --set "cni.enabled=true"  kuma kuma/kuma

$ kubectl apply --server-side -f - <<EOF
apiVersion: kuma.io/v1alpha1                                                                                                                                                                                        kind: Mesh
metadata:
  name: default
spec:
  mtls:
    enabledBackend: ca-1
    backends:
      - name: ca-1
        type: builtin
        dpCert:
          rotation:
            expiration: 1d
        conf:
          caCert:
            RSAbits: 2048
            expiration: 10y
  networking:
    outbound:
      passthrough: false
EOF

$ helm upgrade --install --create-namespace -n kong kong kong/ingress

$ kubectl label ns kong kuma.io/gateway=enabled

$ kubectl label ns kong kuma.io/sidecar-injection=enabled

Checking after a restart shows that the mesh is in place and that the Pods are running with sidecars, can restart, and are receiving configuration updates:

$ ./kumactl get mesh default
NAME      mTLS           METRICS   LOGGING   TRACING   LOCALITY   ZONEEGRESS   AGE
default   builtin/ca-1   off       off       off       off        off          31m

$ kubectl get po -n kong
NAME                              READY   STATUS    RESTARTS   AGE
kong-controller-cc4c698cd-jrm8w   2/2     Running   0          21m
kong-gateway-57968b7d58-8zdq6     2/2     Running   0          21m

$ kubectl get mesh -n kong
NAME      AGE
default   32m

$ kubectl get namespaces kong -oyaml
apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2023-12-07T17:53:48Z"
  labels:
    kubernetes.io/metadata.name: kong
    kuma.io/gateway: enabled
    kuma.io/sidecar-injection: enabled
    name: kong
  name: kong
  resourceVersion: "2146"
  uid: 591e108a-9597-4f54-bd00-25bc2f235344
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

$ kubectl delete po -n kong kong-gateway-57968b7d58-8zdq6 kong-controller-cc4c698cd-jrm8w
pod "kong-gateway-57968b7d58-8zdq6" deleted
pod "kong-controller-cc4c698cd-jrm8w" deleted

$ kubectl get po -n kong
NAME                              READY   STATUS    RESTARTS   AGE
kong-controller-cc4c698cd-w269p   2/2     Running   0          45s
kong-gateway-57968b7d58-xzl5s     2/2     Running   0          45s

$ kubectl get po -n kong kong-controller-cc4c698cd-w269p -oyaml | grep -A1 ADMIN_SVC
    - name: CONTROLLER_KONG_ADMIN_SVC
      value: kong/kong-gateway-admin

$ kubectl logs -n kong kong-controller-cc4c698cd-w269p | grep -i success | tail -1
2023-12-07T18:19:00Z    info    Successfully synced configuration to Kong   {"url": "https://10.244.0.15:8444", "update_strategy": "InMemory", "v": 0}

$ kubectl get po -n kong kong-gateway-57968b7d58-xzl5s -oyaml | grep -A1 ADMIN_LIS
    - name: KONG_ADMIN_LISTEN
      value: 0.0.0.0:8444 http2 ssl
--
    - name: KONG_ADMIN_LISTEN
      value: 0.0.0.0:8444 http2 ssl

Though not shown, I'll assert that installing kong/ingress with controller.ingressController.adminApi.tls.client.enabled=true and gateway.admin.tls.client.secretName=<Secret> (roughly, some steps omitted) to enable controller-side mTLS will fail. It should, as there's no way for the mesh to actually present the original controller client certificate. HTTPS with server-side TLS only, however, is fine.