envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.71k stars 4.75k forks source link

TCP traffic tapping with streaming uses max_buffered_rx_byte and max_buffered_tx_bytes #13802

Open jphx opened 3 years ago

jphx commented 3 years ago

I've configured an upstream cluster to enable TCP traffic tapping, using the admin-style configuration that streams the data to the /tap admin endpoint. My upstream cluster configuration includes:

"per_connection_buffer_limit_bytes": 131072,
"transport_socket": {
  "name": "envoy.transport_sockets.tap",
  "typed_config": {
    "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tap.v3.Tap",
    "common_config": {
      "admin_config": {
        "config_id": "mtu.hserver-multi-tenant-upstream.cerberus"
      }
    },
    "transport_socket": {
      "name": "envoy.transport_sockets.tls"
    }
  }
},

Then I post this document to Envoy's /tap endpoint:

config_id: mtu.hserver-multi-tenant-upstream.cerberus
tap_config:
    match:
        any_match: true
    output_config:
        streaming: true
        sinks:
            - streaming_admin: {}

This works except that when I drive an HTTP request with a large request or response body, the body data in each event document is always truncated at 1024 bytes, discarding anywhere between 32K to 80K bytes for each event. In order to prevent the truncation, I found I had to add:

        max_buffered_rx_bytes: 204800
        max_buffered_tx_bytes: 204800

The documentation seems to indicate that these settings are only used for non-streaming captures, however. Is this a bug or intended? If it's intended, what value can I specify that would guarantee no truncation? I've noticed that it doesn't seem to be related to my per_connection_buffer_limit_bytes, at least not in an obvious way.

Oh, I should mention that due to recent tapping fixes, I tested this using the envoy-dev Docker image, specifically this version:

40de14954b29b4c1c87793482b90b27da3370f0f/1.17.0-dev/Clean/RELEASE/BoringSSL
mattklein123 commented 3 years ago

This is a doc bug. The intention is to avoid the tap traces getting too large. I think you could basically supply "max int" for the values to tap everything.

jphx commented 3 years ago

Thanks for the info. I was afraid that specifying a large number for max_buffered_rx_bytes and max_buffered_tx_bytes would cause large as chunks of memory to be allocated, but if you're suggesting "max int", that must not be the case.