envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.57k stars 341 forks source link

Is it possible to use envoy gateway 1.1.2 as a transparent proxy before redis cluster? #4409

Open zentavr opened 1 week ago

zentavr commented 1 week ago

Description: I have a redis cluster with 3 shards (1 master per shard as the PoC). The cluster sits inside the Kubernetes where EG is installed as well. Is it possible to use the CRDs this helm chart provides to build the configuration on top of that cluster?

I expected that it would be possible with kind: Backend and TCPRoute, but no... (ref)

Example here in the repo is not useful as well.

There are no examples how to set up filters and so on.

zentavr commented 1 week ago

It looks like Gateway of gateway.networking.k8s.io, which contains spec.listeners does not support of specifying of filter_chains. That means we cannot simply use the CRD for the task, correct?

zirain commented 1 week ago

we may need a RedisRoute CRD to support this? Before that, we need to align with Gateway API project's roadmap.

arkodg commented 1 week ago

for the time being , you can use EnvoyPatchPolicy as a workaround https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/ to add these Redis Cluster specific config https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/clusters/redis/v3/redis_cluster.proto to the xDS cluster config

zirain commented 1 week ago

Another option is using https://gateway.envoyproxy.io/docs/tasks/extensibility/extension-server/.

zentavr commented 1 week ago

Hello @arkodg. I'm trying to do in this way:

  1. envoy-gateway is deployed by the helm chart into sys-envoy-proxy namespace. Using ArgoCD for that (I think this is not a big deal). Helm chart values are:
    
    ---
    deployment:
    replicas: 1
    pod:
    affinity: {}
    tolerations: []

config: envoyGateway: gateway: controllerName: gateway.envoyproxy.io/gatewayclass-controller provider: type: Kubernetes extensionApis: enableEnvoyPatchPolicy: true enableBackend: true logging: level: default: info


2. My Redis Cluster is located inside `website-staging` namespace (as well as the node.js application which should use that cluster).
3. After the deployment of envoy-gateway is done, trying to do the next:
a) Deploying of the `EnvoyProxy` with:
```yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: redis-cluster-proxy-config
  namespace: website-staging
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
      envoyHpa:
        minReplicas: 2
        maxReplicas: 3
        metrics:
          - resource:
              name: cpu
              target:
                averageUtilization: 60
                type: Utilization
            type: Resource
  bootstrap:
    type: Merge
    value: |
      static_resources:
        listeners:
          - name: redis_listener
            address:
              socket_address:
                address: 0.0.0.0
                port_value: 6379
            filter_chains:
              - filters:
                  - name: envoy.filters.network.redis_proxy
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
                      stat_prefix: redis_stats
                      prefix_routes:
                        catch_all_route:
                          cluster: redis_cluster
                      settings:
                        op_timeout: 5s
                        enable_redirection: true
                      # TODO: This gets deleted?? Why?
                      # Maybe this helps: https://github.com/envoyproxy/envoy/blob/release/v1.31/test/extensions/filters/network/redis_proxy/redis_proxy_integration_test.cc
                      downstream_auth_passwords:
                        - inline_string: qRpDh7a4Pt9jCSHM

        clusters:
          - name: redis_cluster
            connect_timeout: 1s
            dns_lookup_family: V4_ONLY
            load_assignment:
              cluster_name: redis_cluster
              endpoints:
                - lb_endpoints:
                    endpoint:
                      address:
                        socket_address:
                          address: redis-cluster-headless.website-staging.svc.cluster.local
                          port_value: 6379
            #hosts:
            #  - socket_address:
            #      address: redis-cluster-headless.website-staging.svc.cluster.local
            #      port_value: 6379
            cluster_type:
              name: envoy.clusters.redis
              typed_config:
                "@type": type.googleapis.com/google.protobuf.Struct
                value:
                  cluster_refresh_rate: 30s
                  cluster_refresh_timeout: 0.5s
                  redirect_refresh_interval: 10s
                  redirect_refresh_threshold: 10

b) Deploying GatewayClass as:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: redis-eg-cls
  namespace: website-staging
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: redis-cluster-proxy-config
    namespace: website-staging

c) Deploying Gateway as:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg-gw
  namespace: website-staging
spec:
  gatewayClassName: redis-eg-cls
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: redis-cluster-proxy-config
  listeners:
    - name: redis-listener
      protocol: TCP
      port: 6379

As a result I can see envoy-website-staging-eg-gw-12e00436 Deployment had appeared inside sys-envoy-proxy namespace with 2 pods under it. The logs of the pod say:

[2024-10-07 19:52:52.548][1][warning][misc] [source/extensions/filters/network/http_connection_manager/config.cc:83] internal_address_config is not configured. The existing default behaviour will trust RFC1918 IP addresses, but this will be changed in next release. Please explictily config internal address config as the migration step.
[2024-10-07 19:52:52.549][1][warning][redis] [source/extensions/filters/network/redis_proxy/proxy_filter.cc:33] redirections without DNS lookups enabled might cause client errors, set the dns_cache_config field within the connection pool settings to avoid them
[2024-10-07 19:52:52.582][1][warning][config] [source/extensions/config_subscription/grpc/delta_subscription_state.cc:269] delta config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) website-staging/eg-gw/redis-listener: error adding listener '0.0.0.0:6379': no filter chains specified
[2024-10-07 19:52:52.582][1][warning][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138] gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) website-staging/eg-gw/redis-listener: error adding listener '0.0.0.0:6379': no filter chains specified

Connecting using redis-cli with auth does not work. downstream_auth_passwords: is not available in the envoy-proxy pod as well. Also don't know what to do with those warnings.

zentavr commented 1 week ago

Redid EnvoyProxy configuration. Looks like this now:

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: redis-cluster-proxy-config
  namespace: website-staging
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
        patch:
          type: StrategicMerge
          value:
            spec:
              template:
                spec:
                  containers:
                    - name: curl
                      image: curlimages/curl:8.10.1
                      command:
                        - "sleep"
                        - "infinity"
                      resources:
                        requests:
                          cpu: 100m
                          memory: 20Mi
                        limits:
                          cpu: 200m
                          memory: 126Mi
                      securityContext:
                        allowPrivilegeEscalation: false
                        capabilities:
                          drop:
                            - ALL
                        privileged: false
                        readOnlyRootFilesystem: true
                        runAsGroup: 1001
                        runAsNonRoot: true
                        runAsUser: 1001
                        seccompProfile:
                          type: RuntimeDefault
      envoyService:
        name: redis-envoy-service
        type: ClusterIP
        annotations:
          custom1: svc-annotation1
          custom2: svc-annotation2
      envoyHpa:
        minReplicas: 2
        maxReplicas: 3
        metrics:
          - resource:
              name: cpu
              target:
                averageUtilization: 60
                type: Utilization
            type: Resource
  logging:
    level:
      default: info
  bootstrap:
    type: Merge
    value: |
      static_resources:
        listeners:
          - name: redis_listener
            address:
              socket_address:
                address: 0.0.0.0
                port_value: 6379
            filter_chains:
              - filters:
                  - name: envoy.filters.network.redis_proxy
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
                      stat_prefix: redis_stats
                      prefix_routes:
                        catch_all_route:
                          cluster: redis_cluster
                      settings:
                        op_timeout: 5s
                        enable_redirection: true
                      downstream_auth_username:
                        inline_string: default
                      downstream_auth_passwords:
                        - inline_string: qRpDh7a4Pt9jCSHM

        clusters:
          - name: redis_cluster
            connect_timeout: 1s
            dns_lookup_family: V4_ONLY
            #type: STRICT_DNS
            #lb_policy: ROUND_ROBIN
            #load_assignment:
            #  cluster_name: redis_cluster
            #  endpoints:
            #    - lb_endpoints:
            #        endpoint:
            #          address:
            #            socket_address:
            #              address: redis-cluster-headless.website-staging.svc.cluster.local
            #              port_value: 6379
            hosts:
              - socket_address:
                  address: redis-cluster-headless.website-staging.svc.cluster.local
                  port_value: 6379

            cluster_type:
              name: envoy.clusters.redis
              typed_config:
                "@type": type.googleapis.com/google.protobuf.Struct
                value:
                  cluster_refresh_rate: 30s
                  cluster_refresh_timeout: 0.5s
                  redirect_refresh_interval: 10s
                  redirect_refresh_threshold: 10

            typed_extension_protocol_options:
              envoy.filters.network.redis_proxy:
                "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProtocolOptions
                auth_username:
                  inline_string: default
                auth_password:
                  inline_string: qRpDh7a4Pt9jCSHM

The problem now is with the healthcheck:

Startup probe failed: Get "http://172.31.165.59:19001/ready": dial tcp 172.31.165.59:19001: connect: connection refused

I'd added curl sidecar. Doing the request to that address fails. Another request to http://127.0.0.1:19000/ready shows PRE_INITIALIZING

zentavr commented 1 week ago

...Added debug log, have this:

[2024-10-08 17:58:46.144][1][debug][main] [source/server/server.cc:247] Envoy is not fully initialized, skipping histogram merge and flushing stats
[2024-10-08 17:58:46.225][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:385] dns resolution for envoy-gateway started
[2024-10-08 17:58:46.225][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:385] dns resolution for envoy-gateway started
[2024-10-08 17:58:46.227][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:167] dns resolution without records for envoy-gateway
[2024-10-08 17:58:46.227][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:167] dns resolution without records for envoy-gateway
[2024-10-08 17:58:46.228][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:302] dns resolution for envoy-gateway completed with status 0
[2024-10-08 17:58:46.228][1][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 10.100.137.88:18002
[2024-10-08 17:58:46.228][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:193] DNS refresh rate reset for envoy-gateway, refresh rate 5000 ms
[2024-10-08 17:58:46.228][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:302] dns resolution for envoy-gateway completed with status 0
[2024-10-08 17:58:46.228][1][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 10.100.137.88:18000
[2024-10-08 17:58:46.228][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:193] DNS refresh rate reset for envoy-gateway, refresh rate 5000 ms
[2024-10-08 17:58:51.147][1][debug][main] [source/server/server.cc:237] flushing stats

...added dnsutil sidecar, the DNS name is resolveable and provides the endpoints:

bash-5.0$ cat /etc/resolv.conf 
search sys-envoy-proxy.svc.cluster.local svc.cluster.local cluster.local ec2.internal
nameserver 10.100.0.10
options ndots:5

bash-5.0$ dig redis-cluster-headless.website-staging.svc.cluster.local

; <<>> DiG 9.16.27 <<>> redis-cluster-headless.website-staging.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62564
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: b7a714e91e52331f (echoed)
;; QUESTION SECTION:
;redis-cluster-headless.website-staging.svc.cluster.local. IN A

;; ANSWER SECTION:
redis-cluster-headless.website-staging.svc.cluster.local. 5 IN A 172.31.165.34
redis-cluster-headless.website-staging.svc.cluster.local. 5 IN A 172.31.168.211
redis-cluster-headless.website-staging.svc.cluster.local. 5 IN A 172.31.160.164

;; Query time: 0 msec
;; SERVER: 10.100.0.10#53(10.100.0.10)
;; WHEN: Tue Oct 08 18:18:58 UTC 2024
;; MSG SIZE  rcvd: 345

...have no idea what is going on here. Could be similar to envoyproxy/envoy#8223

zentavr commented 1 week ago

...don't know, but envoyproxy/envoy:distroless-v1.31.2 image cannot use Redis. Stuck in that PRE_INITIALIZING state and that's it.

zentavr commented 1 week ago

The problem was with load_assignment block. missed - before endpoint word. The final EnvoyProxy looks like:

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: redis-cluster-proxy-config
  namespace: website-staging
spec:
  provider:
    type: Kubernetes
    kubernetes:
      envoyDeployment:
        replicas: 2
        container:
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
        patch:
          type: StrategicMerge
          value:
            spec:
              template:
                spec:
                  containers:
                    - name: envoy
                      ports:
                        - containerPort: 6379
                          name: tcp-6379
                          protocol: TCP
                        - containerPort: 19001
                          name: metrics
                          protocol: TCP
                        - containerPort: 19000
                          name: admin
                          protocol: TCP
                    - name: curl
                      image: curlimages/curl:8.10.1
                      imagePullPolicy: IfNotPresent
                      command:
                        - "sleep"
                        - "infinity"
                      resources:
                        requests:
                          cpu: 100m
                          memory: 20Mi
                        limits:
                          cpu: 200m
                          memory: 126Mi
                      securityContext:
                        allowPrivilegeEscalation: false
                        capabilities:
                          drop:
                            - ALL
                        privileged: false
                        readOnlyRootFilesystem: true
                        runAsGroup: 1001
                        runAsNonRoot: true
                        runAsUser: 1001
                        seccompProfile:
                          type: RuntimeDefault
                    - name: dnsutils
                      image: registry.k8s.io/e2e-test-images/agnhost:2.39
                      imagePullPolicy: IfNotPresent
                      resources:
                        requests:
                          cpu: 100m
                          memory: 20Mi
                        limits:
                          cpu: 200m
                          memory: 126Mi
                      securityContext:
                        allowPrivilegeEscalation: false
                        capabilities:
                          drop:
                            - ALL
                        privileged: false
                        readOnlyRootFilesystem: true
                        runAsGroup: 1001
                        runAsNonRoot: true
                        runAsUser: 1001
                        seccompProfile:
                          type: RuntimeDefault
      envoyService:
        name: redis-envoy-service
        type: ClusterIP
        annotations:
          custom1: svc-annotation1
          custom2: svc-annotation2
      envoyHpa:
        minReplicas: 2
        maxReplicas: 3
        metrics:
          - resource:
              name: cpu
              target:
                averageUtilization: 60
                type: Utilization
            type: Resource
  logging:
    level:
      default: debug
  bootstrap:
    type: Merge
    value: |
      admin:
        address:
          socketAddress:
            address: 0.0.0.0
            portValue: 19000

      static_resources:
        listeners:
          - name: redis_listener
            address:
              socket_address:
                address: 0.0.0.0
                port_value: 6379
            filter_chains:
              - filters:
                  - name: envoy.filters.network.redis_proxy
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProxy
                      stat_prefix: redis_stats
                      prefix_routes:
                        catch_all_route:
                          cluster: redis_cluster
                      settings:
                        op_timeout: 5s
                        enable_redirection: true
                      downstream_auth_username:
                        inline_string: default
                      downstream_auth_passwords:
                        - inline_string: qRpDh7a4Pt9jCSHM

        clusters:
          - name: redis_cluster
            connect_timeout: 1s
            load_assignment:
              cluster_name: redis_cluster
              endpoints:
                - lb_endpoints:
                    - endpoint:
                        address:
                          socket_address:
                            address: redis-cluster-headless.website-staging.svc.cluster.local
                            port_value: 6379
            dns_lookup_family: V4_ONLY
            lb_policy: ROUND_ROBIN
            upstream_connection_options:
              tcp_keepalive:
                keepalive_time: 60
                keepalive_probes: 1
                keepalive_interval: 5
            type: STRICT_DNS
            typed_extension_protocol_options:
              envoy.filters.network.redis_proxy:
                "@type": type.googleapis.com/envoy.extensions.filters.network.redis_proxy.v3.RedisProtocolOptions
                auth_username:
                  inline_string: default
                auth_password:
                  inline_string: qRpDh7a4Pt9jCSHM

Redis session via CLI to envoy-proxy:

$ redis-cli -u redis://default:qRpDh7a4Pt9jCSHM@redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379/0
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> set foo bar
OK
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> set hello world
OK
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> set ride bike
OK
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get ride
"bike"
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get hello
"world"
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get foo
"bar"

The same to redis-cluster LB:

$ redis-cli -c -u redis://default:qRpDh7a4Pt9jCSHM@redis-cluster:6379/0
redis-cluster:6379> get ride
-> Redirected to slot [4362] located at 172.31.160.164:6379
"bike"
172.31.160.164:6379> get hello
"worls"
172.31.160.164:6379> get foo
-> Redirected to slot [12182] located at 172.31.168.211:6379
"bar"
172.31.168.211:6379> set pl pl
-> Redirected to slot [9587] located at 172.31.165.34:6379
OK
172.31.165.34:6379> set ua ua
-> Redirected to slot [2859] located at 172.31.160.164:6379
OK
172.31.160.164:6379> set ru ru
OK
172.31.160.164:6379> set gb gb
-> Redirected to slot [7769] located at 172.31.165.34:6379
OK
172.31.165.34:6379> 

...again to envoy-proxy:

$ redis-cli -u redis://default:qRpDh7a4Pt9jCSHM@redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379/0
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get ua
"ua"
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get ru
"ru"
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get pl
"pl"
redis-envoy-service.sys-envoy-proxy.svc.cluster.local:6379> get gb
"gb"

Seems like it handles the redirects internally.

arkodg commented 1 week ago

glad you got it working @zentavr my suggestion would be to implement this via TCPRoute + Backend + EnvoyPatchPolicy (edit cluster fields once https://github.com/envoyproxy/gateway/issues/4036 has been fixed

zentavr commented 1 week ago

@arkodg I think the trick is to set up those filters and redis stuff like timeouts and credentials. And the filters need to be done in the listener and at the cluster blocks of the configuration. Does TCPRoute/Backend support the filters?

davem-git commented 6 days ago
extensionApis:
      enableEnvoyPatchPolicy: true
      enableBackend: true

where are these documented at?

zentavr commented 6 days ago

@davem-git extensionsApis.enableEnvoyPatchPolicy is here

extensionsApis.enableBackend is here