envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25k stars 4.81k forks source link

Envoy Filter Only Send one Message per Stream #34400

Closed thiagomata closed 3 months ago

thiagomata commented 5 months ago

Title: Envoy Filter Only Send one Message Per Stream

Description: We are using envoy filters

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
# ...
    patch:
      operation: MERGE
      value:
        typed_per_filter_config:
          envoy.filters.http.ext_proc:
            "@type": "type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExtProcPerRoute"

That filter calls our GRPC server, which measures some metrics, including the number of messages per stream. When we load test our GRPC server locally, using GRPCurl, for example, we can see the number of messages in the same stream is around 10k for a 5-second load test. But when we measure the stream size of the actual GRPC calls from the Envoy filter, all are size 2. Also, the number of EOFs matches the number of calls to the GRPC server. So, there are only two messages in each stream, one of which is an EOF.

It looks like we are not using the best of the streaming capability during these calls, which can reuse the same GRPC request to send multiple messages, creating new streams for each message instead. If we do that properly, we may considerably reduce the latency around network protocols shaking hands.

thiagomata commented 4 months ago

As a reference, here is a simple comparison of performance from the same GRPC Server interacting with the same GRPClient:

Both tests use GRPCurl as a client.

Many GRPCurl requests, each one of them sending one message

grpcurl -max-time "0.1" -plaintext -d '
  {
    "requestHeaders": {
      "attributes": {},
      "headers": {
        "headers": [
          {
            "key": ":authority",
            "value": "'localhost:9090'"
          },
          {
            "key": ":path",
            "value": "/example"
          },
          {
            "key": ":scheme",
            "value": "http"
          },
          {
            "key": ":method",
            "value": "GET"
          },
          {
            "key": "User-Agent",
            "value": "'bot'"
          }
        ]
      }
    }
  }
' "localhost:9090" envoy.service.ext_proc.v3.ExternalProcessor.Process
rm *.log
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health01.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health02.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health03.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health04.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health05.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health06.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health07.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health08.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health09.log" &
timeout 200 bash -c "seq 1000000000 | xargs -I --  ./scripts/grpc-health.sh >>  grpc-health10.log" &
seq 30 | xargs -I -- bash -c "sleep 10 && cat grpc-health*.log | grep PROCESSED | wc -l && date"  
msgs date
2274 Wed 19 Jun 2024 15:19:49 AEST
4540 Wed 19 Jun 2024 15:19:59 AEST
6713 Wed 19 Jun 2024 15:20:09 AEST
8837 Wed 19 Jun 2024 15:20:19 AEST
10784 Wed 19 Jun 2024 15:20:29 AEST
12799 Wed 19 Jun 2024 15:20:39 AEST
14495 Wed 19 Jun 2024 15:20:49 AEST
16399 Wed 19 Jun 2024 15:20:59 AEST
18285 Wed 19 Jun 2024 15:21:10 AEST
20096 Wed 19 Jun 2024 15:21:20 AEST
22173 Wed 19 Jun 2024 15:21:30 AEST
24303 Wed 19 Jun 2024 15:21:40 AEST
26251 Wed 19 Jun 2024 15:21:50 AEST
28175 Wed 19 Jun 2024 15:22:01 AEST
29851 Wed 19 Jun 2024 15:22:11 AEST
31919 Wed 19 Jun 2024 15:22:21 AEST
33756 Wed 19 Jun 2024 15:22:31 AEST
35807 Wed 19 Jun 2024 15:22:42 AEST
37931 Wed 19 Jun 2024 15:22:52 AEST
39311 Wed 19 Jun 2024 15:23:02 AEST
msgs add date
2274 15:19:49
4540 2266 15:19:59
6713 2173 15:20:09
8837 2124 15:20:19
10784 1947 15:20:29
12799 2015 15:20:39
14495 1696 15:20:49
16399 1904 15:20:59
18285 1886 15:21:10
20096 1811 15:21:20
22173 2077 15:21:30
24303 2130 15:21:40
26251 1948 15:21:50
28175 1924 15:22:01
29851 1676 15:22:11
31919 2068 15:22:21
33756 1837 15:22:31
35807 2051 15:22:42
37931 2124 15:22:52
39311 1380 15:23:02
total msgs total duration
37037 00:03:13
msg/second
192

One GRPCurl stream, sending many messages

#!/usr/bin/env bash
# grpcurl load messages
message='
  {
    "requestHeaders": {
      "attributes": {},
      "headers": {
        "headers": [
          {
            "key": ":authority",
            "value": "localhost:9090"
          },
          {
            "key": ":path",
            "value": "/example"
          },
          {
            "key": ":scheme",
            "value": "http"
          },
          {
            "key": ":method",
            "value": "GET"
          },
          {
            "key": "User-Agent",
            "value": "'bot'"
          }
        ]
      }
    }
  }
'
# ensure that each json object is on a single line
# that avoids the error on parsing partial messages during the timeout
line=$(echo "$message" | jq -c)
while true; do
  echo "$line"
done
rm *.log
timeout 200 ./scripts/grpcurl-load-messages.sh | \
  grpcurl -plaintext -d @ "localhost:9090"  envoy.service.ext_proc.v3.ExternalProcessor.Process >> \
  stream_result.log &
 seq 30 | xargs -I -- bash -c "sleep 10 && cat stream_result.log | grep 'PROCESSED' | wc -l && date"
msgs date
268251 Wed 19 Jun 2024 15:51:45 AEST
555624 Wed 19 Jun 2024 15:51:57 AEST
869601 Wed 19 Jun 2024 15:52:10 AEST
1210794 Wed 19 Jun 2024 15:52:24 AEST
1576059 Wed 19 Jun 2024 15:52:40 AEST
1983550 Wed 19 Jun 2024 15:52:57 AEST
2404755 Wed 19 Jun 2024 15:53:16 AEST
2879429 Wed 19 Jun 2024 15:53:35 AEST
3386229 Wed 19 Jun 2024 15:53:58 AEST
3918339 Wed 19 Jun 2024 15:54:21 AEST
4508434 Wed 19 Jun 2024 15:54:48 AEST
msgs diff date
268251 15:51:45
555624 287373 15:51:57
869601 313977 15:52:10
1210794 341193 15:52:24
1576059 365265 15:52:40
1983550 407491 15:52:57
2404755 421205 15:53:16
2879429 474674 15:53:35
3386229 506800 15:53:58
3918339 532110 15:54:21
4508434 590095 15:54:48
total msgs total duration
4542905 0:03:03 122x more messages
msg / second
24825 129x faster!

I have tested different combinations of number of clients process, but the results are quite similar. The stream approach is much faster than the single message approach.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 months ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.