envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.74k stars 4.75k forks source link

Response compression for application/grpc results in RESOURCE_EXHAUSTED v1.18.3 #17556

Closed mbaglole closed 3 years ago

mbaglole commented 3 years ago

Title: Response compression for application/grpc results in RESOURCE_EXHAUSTED v1.18.3

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

Calling grpc service (content-type) application/grpc with response compression (gzip or brotli) enabled, results in 100% failure from the client. The following message is thrown:

error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (276824064 vs. 4194304)"

I see the grpc server receive the request, a 200 OK is sent, includnig the data and relevant grpc trailers. However, I do not understand what results in the ResourceExhausted for the grpc client (grpcurl, golang, python)

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

  1. Configure compressor with response_direction enabled on the listener, including content-type application/grpc
          - name: envoy.filters.http.compressor
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.compressor.v3.Compressor
              response_direction_config:
                common_config:
                  min_content_length: 30
                  content_type:
                    - text/html
                    - application/json
                    - text/plain
                    - application/grpc
                    - application/grpc+proto
                    - application/grpc+json
                  enabled:
                    default_value: true
                    runtime_key: "response_compressor_enabled"
              compressor_library:
                name: gzip-compressor-1
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.compression.gzip.compressor.v3.Gzip
  1. Call grpc service (client does not matter)
# Simple grpc server that echos back to the client to verify a simple request/response.
grpcurl -H "accept-encoding: br" -plaintext -proto ping-schema.proto -import-path . localhost:10000 ping.Echo/Echo
  1. Client will throw error
# GODEBUG=http2debug=2 go run main.go  
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote SETTINGS len=0
2021/07/30 15:54:18 http2: Framer 0xc000134000: read SETTINGS len=24, settings: HEADER_TABLE_SIZE=4096, UNKNOWN_SETTING_8=0, MAX_CONCURRENT_STREAMS=2147483647, INITIAL_WINDOW_SIZE=268435456
2021/07/30 15:54:18 http2: Framer 0xc000134000: read SETTINGS flags=ACK len=0
2021/07/30 15:54:18 http2: Framer 0xc000134000: read WINDOW_UPDATE len=4 (conn) incr=268369921
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote SETTINGS flags=ACK len=0
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote HEADERS flags=END_HEADERS stream=1 len=96
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote DATA flags=END_STREAM stream=1 len=5 data="\x00\x00\x00\x00\x00"
2021/07/30 15:54:18 http2: Framer 0xc000134000: read HEADERS flags=END_HEADERS stream=1 len=87
2021/07/30 15:54:18 http2: decoded hpack field header field ":status" = "200"
2021/07/30 15:54:18 http2: decoded hpack field header field "content-type" = "application/grpc"
2021/07/30 15:54:18 http2: decoded hpack field header field "x-envoy-upstream-service-time" = "1"
2021/07/30 15:54:18 http2: decoded hpack field header field "vary" = "Accept-Encoding"
2021/07/30 15:54:18 http2: decoded hpack field header field "content-encoding" = "gzip"
2021/07/30 15:54:18 http2: decoded hpack field header field "date" = "Fri, 30 Jul 2021 22:54:18 GMT"
2021/07/30 15:54:18 http2: decoded hpack field header field "server" = "envoy"
[envoy_resource_exhausted.clusters.txt](https://github.com/envoyproxy/envoy/files/6910272/envoy_resource_exhausted.clusters.txt)
[envoy_resource_exhausted.server.txt](https://github.com/envoyproxy/envoy/files/6910273/envoy_resource_exhausted.server.txt)
[envoy_resource_exhausted.stats.txt](https://github.com/envoyproxy/envoy/files/6910274/envoy_resource_exhausted.stats.txt)

2021/07/30 15:54:18 http2: Framer 0xc000134000: read DATA stream=1 len=49 data="\x1f\x8b\b\x00\x00\x00\x00\x00\x00\x13b```\x90\xe1\x92*\xc8\xcfKWH+\xca\xcfUPH,Qp60r6346\xf5u1\v\x02\x00\x00\x00\xff\xff"
2021/07/30 15:54:18 http2: Framer 0xc000134000: read DATA stream=1 len=10 data="\x03\x00\x16\x10\x16\u007f!\x00\x00\x00"
2021/07/30 15:54:18 http2: Framer 0xc000134000: read HEADERS flags=END_STREAM|END_HEADERS stream=1 len=24
2021/07/30 15:54:18 http2: decoded hpack field header field "grpc-status" = "0"
2021/07/30 15:54:18 http2: decoded hpack field header field "grpc-message" = ""
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote WINDOW_UPDATE len=4 (conn) incr=49
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2021/07/30 15:54:18 http2: Framer 0xc000134000: wrote RST_STREAM stream=1 len=4 ErrCode=CANCEL
ERRO[0000] Failed to ping!                               error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (2332557312 vs. 4194304)"

Note: The Envoy_collect tool gathers a tarball with debug logs, config and the following admin endpoints: /stats, /clusters and /server_info. Please note if there are privacy concerns, sanitize the data prior to sharing the tarball/pasting.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats, /clusters, /routes, /server_info. For more information, refer to the admin endpoint documentation.

Note: If there are privacy concerns, sanitize the data prior to sharing.

Config:

Include the config used to configure Envoy.

### NOTE: this was occurring in our production environment, so I ran envoy locally to reproduce. The following is basic configuration that sets up an ingress listener with gzip response compression and response compression enabled.

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address: { address: 0.0.0.0, port_value: 9900 }
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          use_remote_address: true
          skip_xff_append: false
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/ping.Echo/" }
                route: 
                  cluster: grpc_services
                  timeout: 17s
          http_filters:
          - name: envoy.filters.http.compressor
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.compressor.v3.Compressor
              response_direction_config:
                common_config:
                  min_content_length: 30
                  content_type:
                    - text/html
                    - application/json
                    - text/plain
                    - application/grpc
                    - application/grpc+proto
                    - application/grpc+json
                  enabled:
                    default_value: true
                    runtime_key: "response_compressor_enabled"
              compressor_library:
                name: gzip-compressor-1
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.compression.gzip.compressor.v3.Gzip
          - name: envoy.router
    - name: grpc_service
      connect_timeout: 1s
      type: STATIC
      load_assignment:
        cluster_name: "grpc-test-local"
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 127.0.0.1
                  port_value: 9999

Logs:

Include the access logs and the Envoy logs.

"[2021-07-30 15:55:54.177][2211851][debug][conn_handler] [external/envoy/source/server/active_tcp_listener.cc:328] [C20] new connection\r\n"
@"[2021-07-30 15:55:54.177][2211851][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:1222] [C20] updating connection-level initial window size to 268435456\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:271] [C20] new stream\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:892] [C20][S17257979190867533773] request headers complete (end_stream=false):\r\n"
@"':method', 'POST'\r\n"
@"':scheme', 'http'\r\n"
@"':path', '/ping.Echo/Ping'\r\n"
@"':authority', 'localhost:10000'\r\n"
@"'content-type', 'application/grpc'\r\n"
@"'user-agent', 'grpc-go/1.27.1'\r\n"
@"'te', 'trailers'\r\n"
@"'accept-encoding', 'gzip'\r\n"
@"\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][router] [external/envoy/source/common/router/router.cc:445] [C20][S17257979190867533773] cluster 'grpc_services' match for URL '/ping.Echo/Ping'\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][router] [external/envoy/source/common/router/router.cc:631] [C20][S17257979190867533773] router decoding headers:\r\n"
@"':method', 'POST'\r\n"
@"':scheme', 'http'\r\n"
@"':path', '/ping.Echo/Ping'\r\n"
@"':authority', 'localhost:10000'\r\n"
@"'content-type', 'application/grpc'\r\n"
@"'user-agent', 'grpc-go/1.27.1'\r\n"
@"'te', 'trailers'\r\n"
@"'accept-encoding', 'gzip'\r\n"
@"'x-forwarded-for', '192.168.1.38'\r\n"
@"'x-forwarded-proto', 'http'\r\n"
@"'x-envoy-internal', 'true'\r\n"
@"'x-request-id', 'af59827f-3539-4983-bf9e-c248a712e7ee'\r\n"
@"'x-envoy-expected-rq-timeout-ms', '17000'\r\n"
@"\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:74] queueing stream due to no available connections\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:241] trying to create new connection\r\n"
@"[2021-07-30 15:55:54.178][2211851][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:143] creating a new connection\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][client] [external/envoy/source/common/http/codec_client.cc:43] [C21] connecting\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][connection] [external/envoy/source/common/network/connection_impl.cc:860] [C21] connecting to 127.0.0.1:9999\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][connection] [external/envoy/source/common/network/connection_impl.cc:876] [C21] connection in progress\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:1222] [C21] updating connection-level initial window size to 268435456\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][http] [external/envoy/source/common/http/filter_manager.cc:779] [C20][S17257979190867533773] request end stream\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][connection] [external/envoy/source/common/network/connection_impl.cc:665] [C21] connected\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][client] [external/envoy/source/common/http/codec_client.cc:82] [C21] connected\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:275] [C21] attaching to next stream\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:171] [C21] creating stream\r\n"
@"[2021-07-30 15:55:54.179][2211851][debug][router] [external/envoy/source/common/router/upstream_request.cc:371] [C20][S17257979190867533773] pool ready\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][router] [external/envoy/source/common/router/router.cc:1244] [C20][S17257979190867533773] upstream headers complete: end_stream=false\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1479] [C20][S17257979190867533773] encoding headers via codec (end_stream=false):\r\n"
@"':status', '200'\r\n"
@"'content-type', 'application/grpc'\r\n"
@"'x-envoy-upstream-service-time', '0'\r\n"
@"'vary', 'Accept-Encoding'\r\n"
@"'content-encoding', 'gzip'\r\n"
@"'date', 'Fri, 30 Jul 2021 22:55:54 GMT'\r\n"
@"'server', 'envoy'\r\n"
@"\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][client] [external/envoy/source/common/http/codec_client.cc:124] [C21] response complete\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:200] [C21] destroying stream: 0 remaining\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1495] [C20][S17257979190867533773] encoding trailers via codec:\r\n"
@"'grpc-status', '0'\r\n"
@"'grpc-message', ''\r\n"
@"\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:997] [C20] stream closed: 0\r\n"
@"[2021-07-30 15:55:54.180][2211851][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:997] [C21] stream closed: 0\r\n"
@"[2021-07-30 15:55:54.182][2211851][debug][misc] [external/envoy/source/common/network/io_socket_error_impl.cc:34] Unknown error code 54 details Connection reset by peer\r\n"
@"[2021-07-30 15:55:54.182][2211851][debug][connection] [external/envoy/source/common/network/connection_impl.cc:633] [C20] remote close\r\n"
@"[2021-07-30 15:55:54.182][2211851][debug][connection] [external/envoy/source/common/network/connection_impl.cc:243] [C20] closing socket: 0\r\n"
@"[2021-07-30 15:55:54.182][2211851][debug][conn_handler] [external/envoy/source/server/active_tcp_listener.cc:76] [C20] adding to cleanup list\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:105] starting async DNS resolution for 127.0.0.1\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][upstream] [external/envoy/source/common/network/apple_dns_impl.cc:148] DNS resolver resolve=127.0.0.1\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][upstream] [external/envoy/source/common/network/apple_dns_impl.cc:158] DNS resolver resolved (127.0.0.1) to (127.0.0.1:0) without issuing call to Apple API\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:113] async DNS resolution complete for 127.0.0.1\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:159] DNS refresh rate reset for 127.0.0.1, refresh rate 5000 ms\r\n"
@"[2021-07-30 15:55:57.893][2211743][debug][main] [external/envoy/source/server/server.cc:209] flushing stats\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:105] starting async DNS resolution for 127.0.0.1\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][upstream] [external/envoy/source/common/network/apple_dns_impl.cc:148] DNS resolver resolve=127.0.0.1\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][upstream] [external/envoy/source/common/network/apple_dns_impl.cc:158] DNS resolver resolved (127.0.0.1) to (127.0.0.1:0) without issuing call to Apple API\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:113] async DNS resolution complete for 127.0.0.1\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][upstream] [external/envoy/source/common/upstream/logical_dns_cluster.cc:159] DNS refresh rate reset for 127.0.0.1, refresh rate 5000 ms\r\n"
@"[2021-07-30 15:56:02.897][2211743][debug][main] [external/envoy/source/server/server.cc:209] flushing stats\r\n"

Note: If there are privacy concerns, sanitize the data prior to sharing.

Call Stack:

If the Envoy binary is crashing, a call stack is required. Please refer to the Bazel Stack trace documentation.

NOTE: I will be OOO next week, I may have a delayed response.

mattklein123 commented 3 years ago

cc @junr03 @rojkov

rojkov commented 3 years ago

I could reproduce the problem with the latest grpcurl and Envoy. It seems grpcurl ignores the response header content-encoding: gzip and tries to interpret the compressed stream as a normal Protobuf message.

After I had grpc-encoding: gzip added to the response grpcurl started to fail with

$ grpcurl -H "accept-encoding: gzip" --plaintext -proto yages-schema.proto -import-path /home/rojkov/work/yages localhost:10000 yages.Echo/Ping
ERROR:
  Code: Unimplemented
  Message: grpc: Decompressor is not installed for grpc-encoding "gzip"

Looks like the gRPC clients have to support compression too.

mbaglole commented 3 years ago

@rojkov may have been a slight misunderstanding here.

I did further digging, and it's due to http compression on the response not being decompressed before the grpc client. If we apply http decompresion on the response and then have the request sent to the grpc client, it works as expected.