envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.64k stars 356 forks source link

TLS passthrough Gateway and TLSRoute to SSL-enabled PostgreSQL instance not working #4594

Open ferdinandosimonetti opened 4 weeks ago

ferdinandosimonetti commented 4 weeks ago

Description:

What issue is being seen? I expected to be able to connect from the outside to an exposed PostgreSQL instance through my newly defined TLS-passthrough Gateway and my TLSRoute

Instead, I received a *server closed the connection unexpectedly immediately after trying to connect

root@d31b66b2a97c:/# psql -h store.fsmn.xyz -p 6432 -U myuser mydatabase
psql: error: connection to server at "store.fsmn.xyz" (10.111.9.116), port 6432 failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

I'm working with a VPN, hence I can reach even Gateways with private IPs (my Kubernetes clusters is an Azure AKS one).

I can still reach the same PostgreSQL instance through a TCP-enabled Gateway and TCPRoute (their configurations are shown below).

root@d31b66b2a97c:/# psql -h store.fsmn.xyz -p 5432 -U us_contributor m9sweeper
Password for user us_contributor:
psql (15.8 (Debian 15.8-0+deb12u1), server 15.3 (OnGres 15.3-build-6.30))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

m9sweeper=>

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

Below you can find my EnvoyProxy, GatewayClass, Gateway and TLSRoute

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: eg
  namespace: envoy-gateway-system
spec:
  mergeGateways: true
  routingType: Service
  logging:
    level:
      default: debug
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 1
        container:
          resources:
            requests:
              cpu: 150m
              memory: 500Mi
            limits:
              cpu: 150m
              memory: 500Mi
      # perchè Azure capisca se deve creare un LoadBalancer "privato" o con IP pubblicogi
      envoyService:
        annotations:
          service.beta.kubernetes.io/azure-load-balancer-internal: "true"
          service.beta.kubernetes.io/azure-load-balancer-ipv4: 10.111.9.116  
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: eg
    namespace: envoy-gateway-system
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: test-fsmn-xyz-stackgres-tls
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: stackgres
      protocol: TLS
      port: 6432
      allowedRoutes:
        kinds:
        - kind: TLSRoute
        namespaces:
          from: All
      tls:
        mode: Passthrough
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: tlsroute-store
  namespace: stackgres-dev
spec:
  parentRefs:
    - name: test-fsmn-xyz-stackgres-tls
      namespace: default
  hostnames:
    - "store.fsmn.xyz"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: store
          port: 5432
          weight: 1

Note: If there are privacy concerns, sanitize the data prior to sharing.

Environment:

Include the environment like gateway version, envoy version and so on.

I'm using Helm to install Envoy Gateway, the Helm Chart version is v1.1.2 My Kubernetes cluster is an Azure AKS

This other combination of TCP-enabled Gateway and TCPRoute allows me to reach the PostgreSQL instance correctly.

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: test-fsmn-xyz-stackgres
  namespace: stackgres-dev
spec:
  gatewayClassName: eg
  listeners:
    - name: stackgres
      protocol: TCP
      port: 5432
      allowedRoutes:
        kinds:
        - kind: TCPRoute
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TCPRoute
metadata:
  name: stackgres-dev-store
  namespace: stackgres-dev
spec:
  parentRefs:
  - name: test-fsmn-xyz-stackgres
    sectionName: stackgres
  rules:
  - backendRefs:
        - group: ""
          kind: Service
          name: store
          port: 5432

**I had exactly the same results (TLSRoute unreachable, server closed the connection unexpectedly) when I tried to follow the tutorial here

Logs:

Include the access logs and the Envoy logs.

Each time I try to connect through the TLS passthrough gateway and TLSRoute I see no log entries (even with debug set for log verbosity)

Here are the kubectl describe outputs for both my TLSRoute and TLS-passthrough enabled Gateway.

k describe tlsroute -n stackgres-dev
Name:         tlsroute-store
Namespace:    stackgres-dev
Labels:       <none>
Annotations:  <none>
API Version:  gateway.networking.k8s.io/v1alpha2
Kind:         TLSRoute
Metadata:
  Creation Timestamp:  2024-10-31T10:10:10Z
  Generation:          1
  Resource Version:    295799376
  UID:                 2364d608-b703-480c-bd9e-9b861cb5d3aa
Spec:
  Hostnames:
    store.fsmn.xyz
  Parent Refs:
    Group:      gateway.networking.k8s.io
    Kind:       Gateway
    Name:       test-fsmn-xyz-stackgres-tls
    Namespace:  default
  Rules:
    Backend Refs:
      Group:
      Kind:    Service
      Name:    store
      Port:    5432
      Weight:  1
Status:
  Parents:
    Conditions:
      Last Transition Time:  2024-10-31T10:10:10Z
      Message:               Route is accepted
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
      Last Transition Time:  2024-10-31T10:10:10Z
      Message:               Resolved all the Object references for the Route
      Observed Generation:   1
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
    Controller Name:         gateway.envoyproxy.io/gatewayclass-controller
    Parent Ref:
      Group:      gateway.networking.k8s.io
      Kind:       Gateway
      Name:       test-fsmn-xyz-stackgres-tls
      Namespace:  default
Events:           <none>
k describe gateway/test-fsmn-xyz-stackgres-tls
Name:         test-fsmn-xyz-stackgres-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  gateway.networking.k8s.io/v1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2024-10-31T10:06:26Z
  Generation:          2
  Resource Version:    295799375
  UID:                 a2b58ba0-ef3a-46ab-ad23-223850934a63
Spec:
  Gateway Class Name:  eg
  Listeners:
    Allowed Routes:
      Kinds:
        Group:  gateway.networking.k8s.io
        Kind:   TLSRoute
      Namespaces:
        From:  All
    Name:      stackgres
    Port:      6432
    Protocol:  TLS
    Tls:
      Mode:  Passthrough
Status:
  Addresses:
    Type:   IPAddress
    Value:  10.111.9.116
  Conditions:
    Last Transition Time:  2024-10-31T10:10:10Z
    Message:               The Gateway has been scheduled by Envoy Gateway
    Observed Generation:   2
    Reason:                Accepted
    Status:                True
    Type:                  Accepted
    Last Transition Time:  2024-10-31T10:10:10Z
    Message:               Address assigned to the Gateway, 1/1 envoy Deployment replicas available
    Observed Generation:   2
    Reason:                Programmed
    Status:                True
    Type:                  Programmed
  Listeners:
    Attached Routes:  1
    Conditions:
      Last Transition Time:  2024-10-31T10:10:10Z
      Message:               Sending translated listener configuration to the data plane
      Observed Generation:   2
      Reason:                Programmed
      Status:                True
      Type:                  Programmed
      Last Transition Time:  2024-10-31T10:10:10Z
      Message:               Listener has been successfully translated
      Observed Generation:   2
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
      Last Transition Time:  2024-10-31T10:10:10Z
      Message:               Listener references have been resolved
      Observed Generation:   2
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
    Name:                    stackgres
    Supported Kinds:
      Group:  gateway.networking.k8s.io
      Kind:   TLSRoute
Events:       <none>
arkodg commented 4 weeks ago

the only difference between TCPRoute and TLSRoute is an additional TLS Inspector filter that performs a SNI check https://github.com/envoyproxy/gateway/blob/db6802736680a08a210b16085af5a7bf2f124127/internal/xds/translator/testdata/out/xds-ir/tls-route-passthrough.listeners.yaml#L17 cc @cpakulski

ferdinandosimonetti commented 3 weeks ago

I've come to understand the problem: fact is, that PostgreSQL's TLS involves an initial phase in clear when the client issues a STARTTLS request, then the tunnel establishes.

This, however, conflicts with Envoy-side setup, with TLS tunnel already up from its side, and SNI detection.

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Arko Dasgupta @.> Sent: Friday, November 1, 2024 1:08:19 AM To: envoyproxy/gateway @.> Cc: Ferdinando Simonetti @.>; Author @.> Subject: Re: [envoyproxy/gateway] TLS passthrough Gateway and TLSRoute to SSL-enabled PostgreSQL instance not working (Issue #4594)

the only difference between TCPRoute and TLSRoute is an additional TLS Inspector filter that performs a SNI check https://github.com/envoyproxy/gateway/blob/db6802736680a08a210b16085af5a7bf2f124127/internal/xds/translator/testdata/out/xds-ir/tls-route-passthrough.listeners.yaml#L17 cc @cpakulskihttps://github.com/cpakulski

— Reply to this email directly, view it on GitHubhttps://github.com/envoyproxy/gateway/issues/4594#issuecomment-2451047097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFHUAS7QLV23ZHEDOQUGS4DZ6LA7HAVCNFSM6AAAAABQ6BBQDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJRGA2DOMBZG4. You are receiving this because you authored the thread.Message ID: @.***>

cpakulski commented 3 weeks ago

@ferdinandosimonetti yes, you are correct. I have not initially understood the problem you reported, but after your investigation it is clear why it does not work. Sometime ago I investigated possibility to "wait" with upstream cluster selection (routing) until Envoy completes STARTTLS negotiation with the client, but it was not trivial and work has been put on hold.

arkodg commented 3 weeks ago

hey @cpakulski is there an open issue in envoy proxy that we can link this issue to ?

cpakulski commented 3 weeks ago

Not exact, but somehow related: https://github.com/envoyproxy/envoy/issues/32954

ferdinandosimonetti commented 3 weeks ago

@ferdinandosimonetti yes, you are correct. I have not initially understood the problem you reported, but after your investigation it is clear why it does not work. Sometime ago I investigated possibility to "wait" with upstream cluster selection (routing) until Envoy completes STARTTLS negotiation with the client, but it was not trivial and work has been put on hold.

If there would be a way to ask PostgreSQL to listen in TLS mode directly, even on a different port... then I could use several TLSRoutes with a single listener, for reaching the different PostgreSQL environments (dev, stage, prod) exposing a single IP

So far, I haven't been able to understand how it could be possible, and if it could be possible.

cpakulski commented 3 weeks ago

I could configure downstream transport socket to be TLS (not STARTTLS), so you could select route based on SNI. But you would need non-standard postgres client, as standard one uses STARTTLS. Or maybe you can construct your client to send postgres traffic in clear and that traffic is forwarded to a socket which implements TLS. In that scenario, you do not even need postgres filter and forwarding in Envoy would only need tcp_proxy.