Performance issues/expectations

antondyad commented 1 year ago

Hi!

We're using Spear in production and are noticing that performance might be less than ideal. We're not sure where exactly the problem resides - Spear, EventStoreDB or the network.

Our flow is as follows:

load events from a stream (using SpearClient.stream!)
build the current state from the events
append new events to the same stream (using SpearClient.append)

Our SpearClient module is literally as simple as this:

defmodule SpearClient do
  use Spear.Client, otp_app: :our_app
end

All this takes consistently approx 100-120 ms on production, and almost as much locally. Which seems rather slow.

We have our app deployed on fly.io and the EventStoreDB instance on Event Store Cloud (deployed on Google Cloud in the same geographic region).
Locally EventStoreDB runs via docker compose (in OrbStack).

When isolated to just the Spear calls (no logic of our own), we see the following on localhost:

SpearClient.stream!("stream_with_300_events") |> Enum.into([]) - between 40 and 60 ms
SpearClient.stream!("stream_with_20_events") |> Enum.into([]) - between 20 and 40 ms
`SpearClient.append(new_events, "stream_with_300_events", expect: last_known_revision) - around 10ms
`SpearClient.append(new_events, "stream_with_20_events", expect: last_known_revision) - <5ms

In production append takes typically a bit less than stream! + Enum.into([], but still hovering around 50ms in majority of cases. As a side note, relatively simple queries to PostgresSQL (deployed on Google Cloud too) only take around 3-5 ms on average.

Would it be possible to confirm that these timings are higher than expected? Anything else we could debug or try to speed it up?

Thanks in advance for any advise!

yordis commented 1 year ago

How difficult would it be to test it using the C# client to have a primary baseline of ESDB itself (to some extent)?

antondyad commented 1 year ago

Thanks for the suggestion, @yordis! It was an interesting experiment. Using the official C# library we're getting quite a faster stream reading on localhost:

both longer and shorter streams are read within approx 5-10ms - quite faster than the same streams with Spear 😞

Note: within the same dotnet run the first read is 10x slower (probably JIT and whatnot). But the consequent runs are very fast (5-10ms).

This seems to point to Spear being the bottleneck here. Any advise on how we can proceed?

the-mikedavis commented 1 year ago

I'm a bit busy at the moment so I haven't given this a proper look but 20-40ms does seem quite high. I'm also surprised it's cheaper to write than read.

In general I would recommend tuning the chunk_size option if you're using Spear.stream! - that is probably what is causing the stream_with_300_events to take multiple times the time as stream_with_20_events. If you read all events in one batch it should take virtually the same time for both. If your events are large though you can end up hurting performance so it's something to measure with production-like workloads and find the right value.

I took a look at a network capture of a similar Spear.stream! |> Enum.into([]) locally and I find this surprising:

pcap

https://github.com/NFIBrokerage/spear/files/12797716/spear.tgz (tar + gzipped pcapng)

We send the HTTP/2 HEADERS from the ReadReq but it looks like we're waiting for the TCP ACK from EventStoreDB to send the body of the request. Excluding the time it takes for ACKs the request should be very fast (single-digit milliseconds). So I wonder if this is something that can be fixed by tuning TCP options. I'm not quite sure what is blocking on the ACK: Mint or gen_tcp/ssl, so this needs some further debugging.

antondyad commented 1 year ago

We've captured the network traffic too to compare the C# implementation vs Spear. It does seem like the C# one is less chatty. In case it's useful for further investigation, here are our three dumps:

wireshark_dumps.zip

localhost, no ssl
they all capture reading the same stream_with_300_events
one is produced by the official dotnet library
one is produced by Spear.stream!
one is produced by Spear.read_stream (although this doesn't seem to differ that much from the stream! one).

A few other (useful?) observations from our production setup:

even for very short streams (1-5 events) the stream! time is typically quite long (hovering around 100ms)
the same is observable for non-existing streams
we also have some outliers where append takes hundreds of ms

antondyad commented 1 year ago

Here's a bit more info from our field testing:

we're now loading events using Spear.read_stream/3 instead of Spear.stream!/4 - this has resulted in approx 3x reduction of the average time to load 1 stream
- this difference was not visible locally initially, but is very visible on our production setup
- most of the "real" streams we have are rather short (ie typically only one read_stream call, with a reasonable :max_count, is necessary)
- we also often read non-existing streams

lessless commented 1 year ago

@the-mikedavis, do you reckon it is possible to have similar performance characteristics between stream! and read_stream?

the-mikedavis commented 1 year ago

stream! and read_stream use essentially the same code path. The difference is that stream! uses Stream.unfold/2 to repeatedly read from the server so that you can eventually read even very long streams. I suspect that you see lower total latency because stream! is reading multiple times, and if individual requests have high latency then reading multiple times will have higher total latency. There is room for a small optimization in stream! to stop requesting from the server when it reads a chunk smaller than the requested chunk size. That would eliminate one extra request most of the time. I don't see that as being a true solution here though, it's just working around the high latency problems of individual reads

lessless commented 1 year ago

@the-mikedavis Assuming that both C# and Elixir clients are using the same endpoint (https://developers.eventstore.com/clients/grpc/reading-events.html#reading-from-a-stream) I would expect them to have the same latency.

do you think the C# client reads in a bigger batch by default?

the-mikedavis commented 1 year ago

I believe the C# client reads with effectively no limit and it's set rather low in Spear. You can try tuning that option but I don't think it should have an effect since individual read_stream calls are slow. I suspect this might be solved by tuning some options in gen_tcp/ssl or changing the Mint.HTTP2.stream call here to include the full request when possible. I should have some time later this week to test these ideas out

the-mikedavis commented 12 months ago

Sorry for the long delay! Things have been hectic lately.

I took a look at those packet captures and looking with a filter of http2.header I'm seeing similar numbers between Elixir and C# between the request and response:

Client	Req (sec)	Res (sec)	Time waiting (msec)
C#	0.015503	0.086402	70.899
Elixir	1.059172	1.093208	34.036

In the C# case the read took twice as long on the server but I see that occasionally happening locally too - it looks like the server has some sort of caching to speed up repeated reads so some take half as much time.

It's still looking to me like this latency is coming from the server and isn't really dependent on the client. Could you post a small reproduction repository or gist of the C# code you used in your test? I don't do much dotnet development so that would make it much easier for me to compare

antondyad commented 10 months ago

Sorry for the late reply, @the-mikedavis. Here's the dotnet code I've used for my timings: dotnet_esdb_timing.zip

It basically does something as simple as this:

var stream_name = "Document:134";

var settings = EventStoreClientSettings.Create("esdb://localhost:2113?tls=false");
var client = new EventStoreClient(settings);

var timer = new Stopwatch(); 
timer.Start();

var result = client.ReadStreamAsync(
  Direction.Forwards,
  stream_name,
  StreamPosition.Start
);

await foreach (var @event in result) {
  Console.WriteLine(@event.Event.EventType);
}
timer.Stop();

In the attached project this is actually done twice - the first run seems to be always quite slow (cold-start?).

the-mikedavis commented 10 months ago

I can reproduce the timing differences. With a similar Elixir script:

Mix.install [{:spear, path: "."}, {:jason, "~> 1.0"}]

{:ok, conn} = Spear.Connection.start_link(connection_string: "esdb://localhost:2113?tls=false")

stream_name = "Document:134"

start1 = :erlang.monotonic_time()
conn
|> Spear.stream!(stream_name, from: :start, direction: :forwards)
|> Enum.each(fn event ->
  IO.puts event.type
end)
end1 = :erlang.monotonic_time()

start2 = :erlang.monotonic_time()
conn
|> Spear.stream!(stream_name, from: :start, direction: :forwards)
|> Enum.each(fn event ->
  IO.puts event.type
end)
end2 = :erlang.monotonic_time()

IO.inspect(:erlang.convert_time_unit(end1 - start1, :native, :microsecond) / 1000, label: "read no. 1")
IO.inspect(:erlang.convert_time_unit(end2 - start2, :native, :microsecond) / 1000, label: "read no. 2")

I see ~90ms for the first read with dotnet and ~2-3ms for the subsequent read. With Spear I see ~130ms for the first read and ~80ms for the second.

I'll push up the change I mentioned earlier:

There is room for a small optimization in stream! to stop requesting from the server when it reads a chunk smaller than the requested chunk size. That would eliminate one extra request most of the time.

which improves the numbers a little: 80ms for the first read and ~25ms for the second.

I took some new packet captures comparing these and the HTTP/2 requests themselves seem very fast - for both Spear and dotnet I see the network request taking not more than 2ms for the second read. So the slow-down must be somewhere in either Mint, gpb or Spear.

the-mikedavis commented 10 months ago

I was curious about the chatty WINDOW_UPDATE frames so I took a look into Mint. Mint sends a WINDOW_UPDATE both for the request stream (i.e. the ReadReq) and for the whole connection. That's fine and fits the spec but it sends the WINDOW_UPDATE frames eagerly any time you handle a DATA frame in Mint.HTTP2.stream/2. It can also be wasteful because you might receive a big message with a bunch of DATA frames for the same stream and then a DATA frame or the trailing HEADERs frame with the End Stream flag set. For that case I don't think there's a need to send the WINDOW_UPDATE for the stream, just the connection.

I changed Mint to collect all of the window updates in a single call of Mint.HTTP2.stream/2 (or recv) and send single frames per-stream and one for the connection level: https://github.com/the-mikedavis/mint/commit/c77a13ae07a64680e2a5ac8027c05fced41ac87f. This didn't have any noticeable effect though, nor did commenting out the code to send WINDOW_UPDATE frames entirely. (I thought this might work since the dotnet packet capture contains no WINDOW_UPDATEs.)

the-mikedavis commented 10 months ago

Feel free to give #93 a try. This is not a very high priority problem for me so I will probably not look at this again for a little while. If you'd like to dig deeper I can recommend checking out packet captures with Wireshark and creating flamegraphs. Nothing stuck out to me as odd but you can look for yourself. Set up a script that reads from a reasonably long stream, have it print out its PID with System.pid/0 and wait on IO.gets/1, set export ERL_AFLAGS="+JPperf true", run the script and measure the perf data with perf record --pid '<pid of the script>' --freq 999 -g -- sleep 20. Then you can view a flamegraph in something like https://github.com/KDAB/hotspot which can read perf.data files.

It would also be interesting to try this with another HTTP/2 / GRPC client (for example gun, maybe https://github.com/elixir-grpc/grpc), protobuf definitions are here: https://github.com/EventStore/EventStore/blob/0af9906b39c9cd9fc6e301d5c4690b60b2fddcb1/src/Protos/Grpc/streams.proto. Or to eliminate protobuf (de)serialization altogether and try to just get the network request to be as fast as possible, to narrow down what exactly is slow.

the-mikedavis commented 10 months ago

Despite what I said above about putting this down for a while, I gave this another look today 😄

Here's a script of what I was thinking about above:

perf_test.exs

```elixir Mix.install([:mint, :event_store_db_gpb_protobufs]) defmodule Helper do @moduledoc "A module for recursive helper functions" def read_until_frame(conn, frame_type, acc \\ []) do {:ok, conn, responses} = Mint.HTTP.stream(conn, receive(do: (msg -> msg))) case :lists.keyfind(frame_type, 1, responses) do false -> read_until_frame(conn, frame_type, acc ++ responses) _frame -> {conn, acc ++ responses} end end def decode_all(data, acc \\ []) def decode_all(<<>>, acc), do: :lists.reverse(acc) def decode_all( <<0::unsigned-integer-8, message_length::unsigned-big-integer-8-unit(4), encoded_message::binary-size(message_length), rest::binary>>, acc ) do type = :"event_store.client.streams.ReadResp" decoded = :event_store_db_gpb_protobufs_streams.decode_msg(encoded_message, type) decode_all(rest, [decoded | acc]) end end time = fn label, fun -> IO.write("#{label}: ") {time, value} = :timer.tc(fun) IO.puts("#{time / 1_000}ms") value end {:ok, conn} = time.("open connection", fn -> Mint.HTTP2.connect( :http, System.get_env("EVENTSTORE_HOST") || "localhost", 2113, [] ) end) conn = time.("ping/pong", fn -> {:ok, conn, ping_ref} = Mint.HTTP2.ping(conn) {conn, [{:pong, ^ping_ref}]} = Helper.read_until_frame(conn, :pong) conn end) req_iodata = time.("encode message", fn -> msg = {:"event_store.client.streams.ReadReq", {:"event_store.client.streams.ReadReq.Options", {:stream, {:"event_store.client.streams.ReadReq.Options.StreamOptions", {:"event_store.client.StreamIdentifier", "Document:134"}, {:start, {:"event_store.client.Empty"}}}}, :Forwards, true, {:count, 128}, {:no_filter, {:"event_store.client.Empty"}}, {:"event_store.client.streams.ReadReq.Options.UUIDOption", {:string, {:"event_store.client.Empty"}}}, {:"event_store.client.streams.ReadReq.Options.ControlOption", 1}}} type = :"event_store.client.streams.ReadReq" encoded = :event_store_db_gpb_protobufs_streams.encode_msg(msg, type) length = byte_size(encoded) [ <<0::unsigned-integer-8, length::unsigned-big-integer-8-unit(4)>>, encoded ] end) {conn, request_ref} = time.("ReadReq request", fn -> headers = [ {"te", "trailers"}, {"content-type", "application/grpc+proto"}, {"grpc-encoding", "identity"}, {"grpc-accept-encoding", "identity,deflate,gzip"}, {"accept-encoding", "identity"}, {"user-agent", "grpc-elixir-spear/0.0.0 (mint 1.3.0; Elixir 1.15.7; OTP 25)"}, {"authorization", "Basic YWRtaW46Y2hhbmdlaXQ="} ] {:ok, conn, request_ref} = Mint.HTTP2.request( conn, "POST", "/event_store.client.streams.Streams/Read", headers, :stream ) {conn, request_ref} end) conn = time.("ReadReq stream body", fn -> {:ok, conn} = Mint.HTTP2.stream_request_body(conn, request_ref, req_iodata) {:ok, conn} = Mint.HTTP2.stream_request_body(conn, request_ref, :eof) conn end) {conn, responses} = time.("ReadReq await responses", fn -> Helper.read_until_frame(conn, :done) end) Mint.HTTP2.close(conn) responses = time.("decode response", fn -> all_data = Enum.reduce(responses, <<>>, fn {:data, ^request_ref, data}, acc -> acc <> data _, acc -> acc end) Helper.decode_all(all_data) end) IO.inspect(responses, limit: 5, label: "events") ```

So we can time Mint and gpb independently. The results are a little surprising. This is against a stream with 20 events and a local eventstore container running without TLS:

$ elixir perf_test.exs
open connection: 12.558ms
ping/pong: 3.358ms
encode message: 18.89ms
ReadReq request: 12.586ms
ReadReq stream body: 0.013ms
ReadReq await responses: 11.335ms
decode response: 0.048ms
events: [
  "event_store.client.streams.ReadResp": {:event,
   {:"event_store.client.streams.ReadResp.ReadEvent", {...}, ...}},
  "event_store.client.streams.ReadResp": {:event,
   {:"event_store.client.streams.ReadResp.ReadEvent", ...}},
  "event_store.client.streams.ReadResp": {:event, {...}},
  "event_store.client.streams.ReadResp": {:event, ...},
  "event_store.client.streams.ReadResp": {...},
  ...
]

encode message (gpb's encode_msg/2) takes quite a bit longer than I thought it would. The ReadReq request (Mint.HTTP2.request/5) and ReadReq await responses (Mint.HTTP.stream/2) are also fairly slow and could be looked into separately.

I wouldn't be opposed to dropping the dependency on gdb (which we use via event_store_db_gpb_protobufs) in favor of something in-tree if it can boost performance reasonably well. We only handle a few slow-changing protobuf definitions from upstream so we don't necessarily need a general purpose protobuf encoder/decoder. I imagine that this would take a fair amount of work to change though.

lessless commented 10 months ago

@the-mikedavis, thank you for looking into that. I'm quite surprised with the ReadReq request and ReadReq await performance. Surely that isn't because Mint is inherently slow, isn't it?

the-mikedavis commented 10 months ago

I took a look into the Mint.HTTP2.request/5 part and there are some easy wins to speed that up: https://github.com/elixir-mint/mint/pull/423

I also looked at splitting the timing up for receive/1 and Mint.HTTP.stream/2:

perf_test.exs

```elixir Mix.install([{:mint, path: "~/src/mint"}, :event_store_db_gpb_protobufs]) # Mix.install([:mint, :event_store_db_gpb_protobufs]) defmodule Helper do @moduledoc "A module for recursive helper functions" def time(label, fun) do {time, value} = :timer.tc(fun) IO.puts("#{label}: #{time / 1_000}ms") value end def read_until_frame(conn, frame_type, acc \\ []) do message = time("\treceive message", fn -> receive(do: (msg -> msg)) end) {:ok, conn, responses} = time("\tMint.HTTP.stream/2", fn -> Mint.HTTP.stream(conn, message) end) case :lists.keyfind(frame_type, 1, responses) do false -> read_until_frame(conn, frame_type, acc ++ responses) _frame -> {conn, acc ++ responses} end end def decode_all(data) do decode_all(data, []) end defp decode_all(<<>>, acc), do: :lists.reverse(acc) defp decode_all( <<0::unsigned-integer-8, message_length::unsigned-big-integer-8-unit(4), encoded_message::binary-size(message_length), rest::binary>>, acc ) do type = :"event_store.client.streams.ReadResp" decoded = :event_store_db_gpb_protobufs_streams.decode_msg(encoded_message, type) decode_all(rest, [decoded | acc]) end end {:ok, conn} = Helper.time("open connection", fn -> Mint.HTTP2.connect( :http, System.get_env("EVENTSTORE_HOST") || "localhost", 2113, [] ) end) conn = Helper.time("ping/pong", fn -> {:ok, conn, ping_ref} = Mint.HTTP2.ping(conn) {conn, [{:pong, ^ping_ref}]} = Helper.read_until_frame(conn, :pong) conn end) req_iodata = Helper.time("encode message", fn -> msg = {:"event_store.client.streams.ReadReq", {:"event_store.client.streams.ReadReq.Options", {:stream, {:"event_store.client.streams.ReadReq.Options.StreamOptions", {:"event_store.client.StreamIdentifier", "Document:134"}, {:start, {:"event_store.client.Empty"}}}}, :Forwards, true, {:count, 128}, {:no_filter, {:"event_store.client.Empty"}}, {:"event_store.client.streams.ReadReq.Options.UUIDOption", {:string, {:"event_store.client.Empty"}}}, {:"event_store.client.streams.ReadReq.Options.ControlOption", 1}}} type = :"event_store.client.streams.ReadReq" encoded = :event_store_db_gpb_protobufs_streams.encode_msg(msg, type) length = byte_size(encoded) [ <<0::unsigned-integer-8, length::unsigned-big-integer-8-unit(4)>>, encoded ] end) {conn, request_ref} = Helper.time("ReadReq request", fn -> headers = [ {"te", "trailers"}, {"content-type", "application/grpc+proto"}, {"grpc-encoding", "identity"}, {"grpc-accept-encoding", "identity,deflate,gzip"}, {"accept-encoding", "identity"}, {"user-agent", "grpc-elixir-spear/0.0.0 (mint 1.3.0; Elixir 1.15.7; OTP 25)"}, {"authorization", "Basic " <> Base.encode64("admin:changeit")} ] {:ok, conn, request_ref} = Mint.HTTP2.request( conn, "POST", "/event_store.client.streams.Streams/Read", headers, :stream ) {conn, request_ref} end) conn = Helper.time("ReadReq stream body", fn -> {:ok, conn} = Mint.HTTP2.stream_request_body(conn, request_ref, req_iodata) {:ok, conn} = Mint.HTTP2.stream_request_body(conn, request_ref, :eof) conn end) {conn, responses} = Helper.time("ReadReq await responses", fn -> Helper.read_until_frame(conn, :done) end) Mint.HTTP2.close(conn) responses = Helper.time("decode response", fn -> all_data = Enum.reduce(responses, <<>>, fn {:data, ^request_ref, data}, acc -> acc <> data _, acc -> acc end) Helper.decode_all(all_data) end) IO.inspect(responses, limit: 5, label: "events") ```

Results:

open connection: 7.469ms
    receive message: 0.0ms
    Mint.HTTP.stream/2: 2.182ms
    receive message: 0.0ms
    Mint.HTTP.stream/2: 0.005ms
ping/pong: 2.277ms
encode message: 17.555ms
ReadReq request: 2.433ms
ReadReq stream body: 0.016ms
    receive message: 20.523ms
    Mint.HTTP.stream/2: 0.07ms
    receive message: 0.0ms
    Mint.HTTP.stream/2: 0.076ms
    receive message: 0.0ms
    Mint.HTTP.stream/2: 0.015ms
ReadReq await responses: 20.738ms
decode response: 0.047ms
events: [
  "event_store.client.streams.ReadResp": {:event,
   {:"event_store.client.streams.ReadResp.ReadEvent", {...}, ...}},
  "event_store.client.streams.ReadResp": {:event,
   {:"event_store.client.streams.ReadResp.ReadEvent", ...}},
  "event_store.client.streams.ReadResp": {:event, {...}},
  "event_store.client.streams.ReadResp": {:event, ...},
  "event_store.client.streams.ReadResp": {...},
  ...
]

So it looks like the awaiting responses part is dominated by waiting for the message from the gen_tcp process. That would take a lot more digging to figure out but it does seem like a pretty long wait.

the-mikedavis commented 6 months ago

Ah I think I know what the issue is. Could you try setting TCP nodelay and see how that fares?

Spear.Connection.start_link(connection_string: "esdb://localhost:2113?tls=false", mint_opts: [transport_opts: [nodelay: true]])

We're sending pretty small packets for ReadReq and friends and the TCP socket might be buffering those for a while. I see this reduce the time of the second read (in this script above to ~1ms.

If you can reproduce, I would definitely consider setting this by default (within Spear) since we're sending somewhat small packets.

lessless commented 6 months ago

@the-mikedavis I think that might be it. Just saw https://brooker.co.za/blog/2024/05/09/nagle.html a couple days ago and thought about NODELAY :)

NFIBrokerage / spear

Performance issues/expectations #92