Approximately 50ms latency increase observed for synchronous vs asynchronous calls

cprice404 commented 1 year ago

Describe the bug

We have a very low-latency gRPC service that we have written client code for in several different programming languages. When we load test it in most languages, we are able to observe a p99 latency of < 5ms for our basic APIs.

When calling our service using elixir-grpc, we observe the following:

When we load test via several concurrent "worker" Tasks, each of which issue a lot of requests in a tight loop, the latencies look similar to what we see in other programming languages.
The faster we issue the requests, the more likely they are to have good (low) latencies.
However, if we simplify the driver program to issue requests synchronously, one at a time, in a simple loop, then every request seems to incur some unknown overhead that increases the latency by about 50ms.

To Reproduce

I can provide a simple reproducer script if requested; I have one right now that can be run against our production gRPC service but requires a (free) auth token. Alternately I could write a small echo server gRPC service and provide both the client and the server code to repro. If someone is able to help look into this please let me know which you prefer and I will update the issue.

Expected behavior

Calling APIs synchronously in a simple loop should not exhibit worse (higher) client-side latencies than calling the same APIs asynchronously and concurrently.

Logs

N/A

Protos

Can provide these upon request depending on whether you would prefer to see a reproducer against our production gRPC service or a standalone client/server reproducer.

Versions:

OS: Amazon Linux 2023, Amazon Linux 2, macOS Ventura
Erlang: 25
Elixir: 1.14
mix.lock(grpc, gun, cowboy, cowlib): Repros with both elixir-grpc 0.5.0 and 0.6.0. current lock file includes gun 2.0.1, cowboy 2.9, cowlib 2.11

Additional context

If there is anyone who is able to help look into this please let me know what additional context would be useful and we will provide it!

sleipnir commented 1 year ago

This strikes me as a mailbox congestion issue that makes a lot of sense for Erlang and Elixir applications that benefit best from systems that can be parallelized.

polvalente commented 1 year ago

@cprice404 if you do have a script/livebook to replicate the issue, it would be helpful!

cprice404 commented 1 year ago

@polvalente thanks, we'll try to put a simple repro together.

cprice404 commented 1 year ago

I have not been able to reproduce this so far with a local insecure/plaintext/http grpc server.

So it may have something to do with the TLS setup. Providing a completely self-contained example including the server then becomes challenging because of certificates and DNS.

So, attached is a tarball that makes grpc requests to an actual deployed (development) instance of our service. The tarball contains both a python and an elixir client example.

To run the python example, from the root dir of the expanded tarball (requires pipenv):

cd python_client
./initialize.sh
./run-client.sh

You will see the python grpc client making a bunch of requests in a loop, with a fairly consistent latency being printed out (for me, about 10-12ms, but this will depend on your network).

Then, elixir (also from the tarball root):

cd elixir_client
mix deps.get
mix run ./client.exs

Does pretty much the exact same thing as the python client, but you will notice that there is an extra ~50ms reported in the latency for each request.

I tried to instrument a local copy of elixir-grpc/gun and the extra time appears to happen in gun.ex -> receive_data/2 -> receive_headers/4 -> await/3 -> :gun.await. But the fact that so far I have only seen this with TLS enabled makes me suspicious that either I have my TLS configured wrong, or that there is something causing the TLS handshake to re-occur between requests or something like that.

And again the other observation is that if I do a bunch of concurrent async requests instead of serial synchronous requests like this, that ~50ms penalty seems to happen much less frequently (but it still happens). standalone_ping_repro.tgz

cprice404 commented 1 year ago

I also tried changing my tls options to:

tls_options = [verify: :verify_peer, depth: 100, cacerts: :certifi.cacerts()]

In case it was some issue with the :tls_certificate_check library, but that didn't help.

cprice404 commented 1 year ago

I also wrote a simple program that times gun HTTPS requests in a loop, and I don't see this issue there - but that server is HTTP 1.1. Poking around in the gun docs I saw this:

https://ninenines.eu/docs/en/gun/2.0/guide/internals_tls_over_tls/#_stream_wide_tls_over_tls

Which makes me wonder if this is related to HTTP/2 TLS with gun.

polvalente commented 1 year ago

The difference is even more dramatic locally. The Elixir client is taking ~500ms while the Python client is taking ~200ms

polvalente commented 1 year ago

Some things I noticed running locally:

Changing the adapter to Mint doesn't affect too much the issue at hand.
using the macOS Activity Monitor, we can see that the Python client sends around as many packets as it receives, while the elixir client sends twice the number of packets as the number it receives.

I suspect somewhere in the code we're sending an extra request that Python is either not sending or merging into another request somehow

cprice404 commented 1 year ago

I suspect somewhere in the code we're sending an extra request that Python is either not sending or merging into another request somehow

or possibly sending the TLS certs/handshake packets more often than the python client, since so far I haven't been able to reproduce this without TLS. Just a guess though.

polvalente commented 1 year ago

It could be that somehow the TLS connection isn't being actually reused. What makes me wonder though is that I saw the same behavior in using the Mint adapter, which makes me suspect it's the library's fault and not the actual underlying client's.

cprice404 commented 1 year ago

@polvalente yeah I also observed the same thing with Mint but am not familiar enough with them to know if Mint uses a similar strategy for TLS.

However, this morning I set up a simple HTTP2 app hosted by nginx with SSL and ran my tests using plain http clients rather than grpc clients. For the python client I used httpx, and in the elixir code I just used gun directly. I got the same performance from both the python client and gun in this case (in other words, neither of the two clients exhibited the extra 50ms overhead that I am observing with elixir-grpc).

So this is one more data point suggesting that the issue lies somewhere in the elixir-grpc library.

Here's the basic script I was running for gun. Based on my attempt to step through the elixir-grpc gun.ex code, I think this is roughly the same life cycle.

open_opts = %{
      protocols: [:http2],
      transport: :tls,
      tls_opts: [
        verify: :verify_peer,
        cacertfile: "/path/to/minica/certs.pem"
      ],
    }
     {:ok, conn_pid} = :gun.open(String.to_charlist("myEc2Host"), 8000, open_opts)
    case :gun.await_up(conn_pid) do
      {:ok, :http2} -> Logger.info("Initialized HTTP2 connection")
      {:error, e} ->
        Logger.error("Failed to initialize HTTP2 connection: #{inspect(e)}")
        raise inspect(e)
    end

    for n <- 1..1_000_000 do
      start_time = :os.system_time(:milli_seconds)
      stream_ref = :gun.get(conn_pid, "/")
      headers = :gun.await(conn_pid, stream_ref)
      Logger.info("Headers: #{inspect(headers)}")
      body = :gun.await(conn_pid, stream_ref)
      Logger.info("Body: #{inspect(body)}")
      duration = :os.system_time(:milli_seconds) - start_time
      Logger.info("Finished request; duration: #{duration}")
    end

If it is looking like it's not the http client's issue, does anyone have any advice on next steps to debug what might be going wrong in the elixir-grpc library code?

cprice404 commented 1 year ago

It could be that somehow the TLS connection isn't being actually reused.

At the h2 stream level that may be possible. At the TCP/connection level I don't think that's the case because I have been monitoring my runs with lsof and it appears that only one connection gets opened for the duration of the program.

lessless commented 9 months ago

Some things I noticed running locally:

Changing the adapter to Mint doesn't affect too much the issue at hand.

using the macOS Activity Monitor, we can see that the Python client sends around as many packets as it receives, while the elixir client sends twice the number of packets as the number it receives.

I suspect somewhere in the code we're sending an extra request that Python is either not sending or merging into another request somehow

We are seeing a 3x increased latency in Elixir vs C# while talking to the local insecure gRPC server. Also using Mint and exactly with the same symptom of Elixir being more chatty https://github.com/NFIBrokerage/spear/issues/92

(maybe a red herring though)

polvalente commented 7 months ago

Sorry for the long delay. If changing clients does not seem to affect the overall performance, there seems to be some sort of protocol implementation difference on the layers above. If anyone wants to look into this, I'm happy to accept any help :)

elixir-grpc / grpc

Approximately 50ms latency increase observed for synchronous vs asynchronous calls #323