emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.37k stars 687 forks source link

gPRC routing doesn't work in GKE when deployed behind L7 GKE ingress LB #4992

Open tkorrapati1 opened 1 year ago

tkorrapati1 commented 1 year ago

Describe the bug emissary(3.0.0) deployed behind an L7 GKE ingress LB as per instructions in emissary on GKE docs. For regular http workloads this seems to work fine. But this doesn't work for gRPC.

Below is my flow: Client (grpccurl) --> GKE Ingress L7 Https LB (TLS is terminated here) --> Emissary Ingress --> k8s (external service)

I added the annotation cloud.google.com/app-protocols: '{"http":"HTTP2"} for external load balancer to use HTTP/2 when it forwards requests to backend.

Nothing works after adding above annotation, i'm seeing 502 in my ingress logs and see no requests logged in emissary pod logs. Looks like emissary is not listening at all for HTTP2 Clear text traffic?.

in some corner of internet I've seen that I need to set the protocol setting is set to http2c in emissary config to enable emissary to listen on HTTP/2 clear text. Is this still the case? if so where can I set this parameter? Don't see how to do this v3.0.0 or any relevant documentation on how to achieve this.


Expected behavior Emissary expected to route incoming gPRC requests on HTTP2 cleartext.

Versions (please complete the following information):

updates:

cindymullins-dw commented 1 year ago

Hi @tkorrapati1 , I think you do need to set grpc: true in your Mapping for either grcp or http2 traffic. Please check the docs note here.

tkorrapati1 commented 1 year ago

Hi @tkorrapati1 , I think you do need to set grpc: true in your Mapping for either grcp or http2 traffic. Please check the docs note here.

@cindymullins-dw I already added that to mapping, here is my full emissary config below


apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: custom-listener-behind-l7-gke-alb
  namespace: aodapn-emissary-ingress-verify
spec:
  port: 8080
  protocol: HTTP
  securityModel: XFP
  l7Depth: 1
  hostBinding:    
    namespace:
      from: ALL
---
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: comms-host-no-tls
  namespace: my-ns 
  labels:
    helm.sh/chart: apn-emissary-ingress-0.1.9
    app.kubernetes.io/instance: verify
spec:
  hostname: comms-gcp2.mydomain.com
  requestPolicy:
    insecure:
      action: Route
  selector:
    matchLabels:
      host-mapping-tether: comms
---
apiVersion: getambassador.io/v3alpha1
kind: Mapping
metadata:
  name: my-instance-svc-backend
  labels:
    helm.sh/chart: apn-emissary-ingress-0.1.9
    app.kubernetes.io/instance: verify
    host-mapping-tether: comms
spec:
  grpc: True
  prefix: /
  rewrite: /
  service: https://my-instance-svc:443
  tls: tls-re-encrypt
  headers:
    instance: my-instance.mydomain.com
  timeout_ms: 300000
---
apiVersion: getambassador.io/v3alpha1
kind: TLSContext
metadata:
  name: tls-re-encrypt
  namespace: my-ns
  labels:
    helm.sh/chart: apn-emissary-ingress-0.1.9
    app.kubernetes.io/instance: verify
spec:
  alpn_protocols: h2
  max_tls_version: v1.3
  min_tls_version: v1.2
  secret: my-tls-certs

This is the error I'm seeing in grpcurl

$ grpcurl -import-path "/Users/tk894618/Desktop/repositories/grpc-proto" -vv -proto appneta.proto -H "instance: my-instance.mydomain.com" comms-gcp2.mydomain.com:443 appneta.MetaService/Healthcheck

Resolved method descriptor:
// Returns true/false if the server is healthy and accepting requests
rpc Healthcheck ( .HealthcheckRequest ) returns ( .HealthcheckResponse );

Request metadata to send:
instance: my-instance.mydomain.com

Response headers received:
(empty)

Response trailers received:
(empty)
Sent 0 requests and received 0 responses
ERROR:
  Code: Unavailable
  Message: unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html; charset=UTF-8"
cindymullins-dw commented 1 year ago

Ok, thanks. Where did you add this annotation, cloud.google.com/app-protocols: '{"http":"HTTP2"}? Is that to Emissary - to the Service - or somewhere else? If applied to the Emissary config, I'm wondering if that's how it works. We rely on annotations only in certain exceptions. Perhaps this is one of them, but I'm not finding any references to this.

tkorrapati1 commented 1 year ago

Ok, thanks. Where did you add this annotation, cloud.google.com/app-protocols: '{"http":"HTTP2"}? Is that to Emissary - to the Service - or somewhere else? If applied to the Emissary config, I'm wondering if that's how it works. We rely on annotations only in certain exceptions. Perhaps this is one of them, but I'm not finding any references to this.

hey @cindymullins-dw, I have added the annotation to the emissary service, this is my helm values file.

emissary-ingress:
  createDefaultListeners: false 
  module:
    diagnostics:
      enabled: true
    lua_scripts: |
      function envoy_on_request(request_handle)
       local authority = request_handle:headers():get(":authority")
       if(string.find(authority, ":") ~= nil)
       then
        local authority_index = string.find(authority, ":")
        local stripped_authority = string.sub(authority, 1, authority_index - 1)
        request_handle:headers():replace(":authority", stripped_authority)
       end
      end
  service:
    type: NodePort
    ports:
      - name: http
        port: 8080
        targetPort: 8080
    annotations: 
      cloud.google.com/backend-config: '{"default": "ambassador-hc-config"}'
      cloud.google.com/neg: '{"ingress": true}'
      cloud.google.com/app-protocols: '{"http":"HTTP2"}'

To be clear the annotation is from GCP, its purpose is to configure L7 Ingress LB to use HTTP/2 while forwarding requests backend

Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-http2


Also to add more information, With out above annotation I see that the requests are making it into emissary pods fine but they are coming in has HTTP/1.1, even though the request coming from a gPRC client.

ACCESS [2023-04-24T23:11:16.181Z] "POST /.MetaService/Healthcheck HTTP/1.1" 200 - 5 7 44 43 "192.19.161.250, 34.95.98.217,130.211.2.218" "grpcurl/1.8.7 grpc-go/1.48.0" "a26fe056-801f-4823-98b2-06a2950a8a4d" "comms-gcp2.mydomain.com:443" "10.1.12.122:443

And below is the grpcurl error I see in this case with out app-protocols annotation.

$ grpcurl -import-path "/Users/Desktop/repositories/grpc-proto" -vv -proto .proto -H "instance: my-instance.mydomain.com" comms-gcp2.mydomain.com:443 .MetaService/Healthcheck

Resolved method descriptor:
// Returns true/false if the server is healthy and accepting requests
rpc Healthcheck ( .HealthcheckRequest ) returns ( .HealthcheckResponse );

Request metadata to send:
instance: instance.mydomain.com

Response headers received:
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
content-security-policy: base-uri 'self'; manifest-src 'self'; script-src * blob: data: 'unsafe-eval' 'unsafe-inline'; style-src * blob: data: 'unsafe-inline'
content-type: application/grpc
cross-origin-embedder-policy: unsafe-none
cross-origin-resource-policy: same-site
date: Mon, 24 Apr 2023 23:26:52 GMT
grpc-accept-encoding: gzip
permissions-policy: accelerometer=(),camera=(),gyroscope=(),magnetometer=(),payment=(),usb=(),fullscreen=(self)
referrer-policy: strict-origin
server: envoy
via: 1.1 google
x-content-type-options: nosniff
x-envoy-upstream-service-time: 45

Response trailers received:
(empty)
Sent 0 requests and received 0 responses
ERROR:
  Code: Internal
  Message: server closed the stream without sending trailers
cyrus-mc commented 1 year ago

@tkorrapati1 I will just add the details here that we spoke about in the Slack (so you can review with Emissary Engineers later today).

If I am understanding your setup correctly you are terminating TLS at the GKE LB and then doing plaintext to Emissary. HTTP2 for the most part only works over TLS (as protocol agreement is part of ALPN which only works over TLS).

There is the concept of HTTP2 prior knowledge which essentially says, as the client I know the server supports HTTP2 so don't try to negotiate that just communicate using HTTP2. Whereas without prior knowledge that is determined using the ALPN. Given that you have plaintext to Emissary you need to force (if possible) the GKE LB to use prior knowledge. Or else by default it is going to attempt to make a TLS connection.

I did some googling and did not see any annotation for the GKE LB to set prior knowledge. It could be that it doesn't support that because HTTP2 is meant to be used over TLS.

tkorrapati1 commented 1 year ago

@tkorrapati1 I will just add the details here that we spoke about in the Slack (so you can review with Emissary Engineers later today).

If I am understanding your setup correctly you are terminating TLS at the GKE LB and then doing plaintext to Emissary. HTTP2 for the most part only works over TLS (as protocol agreement is part of ALPN which only works over TLS).

There is the concept of HTTP2 prior knowledge which essentially says, as the client I know the server supports HTTP2 so don't try to negotiate that just communicate using HTTP2. Whereas without prior knowledge that is determined using the ALPN. Given that you have plaintext to Emissary you need to force (if possible) the GKE LB to use prior knowledge. Or else by default it is going to attempt to make a TLS connection.

I did some googling and did not see any annotation for the GKE LB to set prior knowledge. It could be that it doesn't support that because HTTP2 is meant to be used over TLS.

Thanks for help on this @cyrus-mc!

I'm not too sure what the protocol between L7 LB and my emissary... So I was talking to Google support regarding this yesterday, and per his statement looks like it will be TLS too. (not sure what certs it will be using for TLS)

image

cyrus-mc commented 1 year ago

Given your setup, you are not exposing an HTTPS listener on the emissary side:

Here is your host and listener objects:

apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
 name: custom-listener-behind-l7-gke-alb
 namespace: aodapn-emissary-ingress-verify
spec:
 port: 8080
 protocol: HTTP
 securityModel: XFP
 l7Depth: 1
 hostBinding:    
   namespace:
     from: ALL
---
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
 name: comms-host-no-tls
 namespace: my-ns 
 labels:
   helm.sh/chart: apn-emissary-ingress-0.1.9
   app.kubernetes.io/instance: verify
spec:
 hostname: comms-gcp2.mydomain.com
 requestPolicy:
   insecure:
     action: Route
 selector:
   matchLabels:
     host-mapping-tether: comms

You have set protocol to HTTP. You defined a TLSConext but that doesn't come into play here. So your backend (emissary) is only speaking HTTP/plaintext. You would need to configure HTTPS on the Emissary side. In my setup I just use cert-manager to provision the certificate and then set my host object as such:

apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  labels:
    app.kubernetes.io/component: ingress
    app.kubernetes.io/name: emissary-ingress
    app.kubernetes.io/instance: emissary-ingress
    app.kubernetes.io/managed-by: argocd
  name: all
  namespace: emissary
spec:
  hostname: "*"
  tlsSecret:
    name: emissary-certificate
  tls:
    min_tls_version: v1.2
    alpn_protocols: h2, http/1.1
  mappingSelector:
    matchLabels:
      # associate all mappings that contain this label
      app.kubernetes.io/component: api
tkorrapati1 commented 1 year ago

thanks for helping out @cyrus-mc @AliceProxy and @cindymullins-dw regarding this issue.

I had my emissary ingress configured with https listener and confirmed it is able to route connections fine when bypassing L7 LB completely and hitting the K8s service directly, But noticed I'm getting 502's (ingress logs) and no traffic in my pods at all, when I tried to route from Ingress.

I have been working with google support to understand why emissary is not seeing any traffic from past two weeks. Couple of GCP support cases later, I finally got a response below.

Upon further investigation, we also found that your application may be relying on a particular SNI to validate the TLS request. However, as per this documentation [1] 
“GFEs don't use the Server Name Indication (SNI) extension for connections to the backend”. This means that the TLS connection could fail if the application requires an SNI.

We kindly request that you review your application's configuration to ensure that it can handle TLS connections properly.
If possible, please check if your application is relying on a particular SNI and if so, please consider updating your configuration to address this.

[1] https://cloud.google.com/load-balancing/docs/ssl-certificates/encryption-to-the-backends

I'm kind of sure that emissary needs and relies on SNI for routing requests (please correct me if I'm wrong).

So at this point I'm considering to fall back to just using an L4 LB to expose emissary in GCP.