emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.32k stars 683 forks source link

How to override the connect_timeout for the TCPMapping #5695

Open Karan-Khanchandani opened 1 month ago

Karan-Khanchandani commented 1 month ago

Describe the bug Hi Team,

I created a TCPMapping mentioned below

apiVersion: getambassador.io/v2
kind: TCPMapping
metadata:
  creationTimestamp: "2024-05-30T10:06:58Z"
  generation: 4
  labels:
    appName: nats
    k8slens-edit-resource-version: v2
  name: nats
  namespace: rhel8odp
  resourceVersion: "32428517"
  uid: 9db9d1ef-4f3e-4931-b246-1a6f3e62b22a
spec:
  ambassador_id:
  - --apiVersion-v3alpha1-only--default
  host: '*'
  idle_timeout_ms: "130000"
  port: 30001
  service: ad-events.rhel8odp.svc.cluster.local:4222

But when I check envoy/envoy.json inside the emissary-ingress pod, it is still showing the previous value

{
        "name": "cluster_ad_events_rhel8odp_svc_cluster_local_4222_rhel8odp",
        "type": "STRICT_DNS",
        "lb_policy": "ROUND_ROBIN",
        "connect_timeout": "3.000s",
        "dns_lookup_family": "V4_ONLY",
        "alt_stat_name": "ad_events_rhel8odp_svc_cluster_local_4222",
        "load_assignment": {
          "cluster_name": "cluster_ad_events_rhel8odp_svc_cluster_local_4222_rhel8odp",
          "endpoints": [
            {
              "lb_endpoints": [
                {
                  "endpoint": {
                    "address": {
                      "socket_address": {
                        "address": "ad-events.rhel8odp.svc.cluster.local",
                        "port_value": 4222,
                        "protocol": "TCP"
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }

How can I override connect_timeout here? I tried to find in docs but couldn't find any.

To Reproduce Steps to reproduce the behavior:

  1. Create a TCPMapping
  2. Exec into the emissary-ingress pod and check envoy/envoy.json

Expected behavior A way to override the field

Versions (please complete the following information):

Additional context Add any other context about the problem here.

cindymullins-dw commented 4 weeks ago

Hi @Karan-Khanchandani , the timeout you're trying is indeed mentioned in the docs. As such, we'd expect your idle_timeout_ms setting to at least appear in the Envoy config, but it does not. So this might be a bug.

There are some other timeout specs available on Mappings. I'm not sure if they're enabled for the TCPMapping however. Could you try these to see if the results are any different? https://www.getambassador.io/docs/edge-stack/latest/topics/using/timeouts#connect-timeout-connect_timeout_ms (This one is also 3 seconds by default.)

Or you can try this: https://www.getambassador.io/docs/edge-stack/latest/topics/using/timeouts#request-timeout-timeout_ms

Karan-Khanchandani commented 3 weeks ago

Hi @cindymullins-dw I tried the timeouts mentioned in the docs. The connect_timeout_ms works for Mapping but it doesn't work for TCPMapping


Mapping

apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  creationTimestamp: "2024-05-30T17:09:23Z"
  generation: 2
  labels:
    appName: ad-druid-analyzer
    k8slens-edit-resource-version: v2
  name: ad-druid-analyzer
  namespace: pulsedev
  resourceVersion: "37443867"
  uid: efacd7e7-b09c-45c3-8d0f-6772ca0894aa
spec:
  ambassador_id:
  - --apiVersion-v3alpha1-only--default
  connect_timeout_ms: 120000
  prefix: /ad-druid-analyzer/
  service: http://ad-druid-analyzer.pulsedev.svc.cluster.local:19090
  timeout_ms: 130000

its envon.json entry

{
        "name": "cluster_http___ad_druid_analyzer_pulsede-69B118EDC0C78FF5-0",
        "type": "STRICT_DNS",
        "lb_policy": "ROUND_ROBIN",
        "connect_timeout": "120.000s",
        "dns_lookup_family": "V4_ONLY",
        "alt_stat_name": "ad_druid_analyzer_pulsedev_svc_cluster_local_19090",
        "load_assignment": {
          "cluster_name": "cluster_http___ad_druid_analyzer_pulsede-69B118EDC0C78FF5-0",
          "endpoints": [
            {
              "lb_endpoints": [
                {
                  "endpoint": {
                    "address": {
                      "socket_address": {
                        "address": "ad-druid-analyzer.pulsedev.svc.cluster.local",
                        "port_value": 19090,
                        "protocol": "TCP"
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }

TCPMapping

apiVersion: getambassador.io/v2
kind: TCPMapping
metadata:
  creationTimestamp: "2024-05-30T17:09:23Z"
  generation: 3
  labels:
    appName: logstash
    k8slens-edit-resource-version: v2
  name: logstash
  namespace: pulsedev
  resourceVersion: "37447748"
  uid: 60d38871-48ef-4d3c-b790-55bd0f3a0def
spec:
  ambassador_id:
  - --apiVersion-v3alpha1-only--default
  connect_timeout_ms: 120000
  host: '*'
  port: 30002
  service: ad-logstash.pulsedev.svc.cluster.local:5044
  timeout_ms: 120000

envoy

{
        "name": "cluster_ad_logstash_pulsedev_svc_cluster_local_5044_pulsedev",
        "type": "STRICT_DNS",
        "lb_policy": "ROUND_ROBIN",
        "connect_timeout": "3.000s",
        "dns_lookup_family": "V4_ONLY",
        "alt_stat_name": "ad_logstash_pulsedev_svc_cluster_local_5044",
        "load_assignment": {
          "cluster_name": "cluster_ad_logstash_pulsedev_svc_cluster_local_5044_pulsedev",
          "endpoints": [
            {
              "lb_endpoints": [
                {
                  "endpoint": {
                    "address": {
                      "socket_address": {
                        "address": "ad-logstash.pulsedev.svc.cluster.local",
                        "port_value": 5044,
                        "protocol": "TCP"
                      }
                    }
                  }
                }
              ]
            }
          ]
        }
      }

I tried timeout_ms as well but it didn't seem to work. Sorry if I have misconfigured it. Thanks!

cindymullins-dw commented 2 weeks ago

Just curious if you have you tried the idle_timeout_ms: "130000" value without the quotes so that it applies as an integer? We would have expected an error (perhaps in the logs) if the value type was invalid, but maybe something to check.