Open antondyad opened 1 year ago
How difficult would it be to test it using the C# client to have a primary baseline of ESDB itself (to some extent)?
Thanks for the suggestion, @yordis! It was an interesting experiment. Using the official C# library we're getting quite a faster stream reading on localhost:
both longer and shorter streams are read within approx 5-10ms - quite faster than the same streams with Spear 😞
Note: within the same dotnet run the first read is 10x slower (probably JIT and whatnot). But the consequent runs are very fast (5-10ms).
This seems to point to Spear being the bottleneck here. Any advise on how we can proceed?
I'm a bit busy at the moment so I haven't given this a proper look but 20-40ms does seem quite high. I'm also surprised it's cheaper to write than read.
In general I would recommend tuning the chunk_size
option if you're using Spear.stream!
- that is probably what is causing the stream_with_300_events to take multiple times the time as stream_with_20_events. If you read all events in one batch it should take virtually the same time for both. If your events are large though you can end up hurting performance so it's something to measure with production-like workloads and find the right value.
I took a look at a network capture of a similar Spear.stream! |> Enum.into([])
locally and I find this surprising:
https://github.com/NFIBrokerage/spear/files/12797716/spear.tgz (tar + gzipped pcapng)
We send the HTTP/2 HEADERS from the ReadReq but it looks like we're waiting for the TCP ACK from EventStoreDB to send the body of the request. Excluding the time it takes for ACKs the request should be very fast (single-digit milliseconds). So I wonder if this is something that can be fixed by tuning TCP options. I'm not quite sure what is blocking on the ACK: Mint or gen_tcp/ssl, so this needs some further debugging.
We've captured the network traffic too to compare the C# implementation vs Spear. It does seem like the C# one is less chatty. In case it's useful for further investigation, here are our three dumps:
stream_with_300_events
stream!
one).A few other (useful?) observations from our production setup:
stream!
time is typically quite long (hovering around 100ms)append
takes hundreds of msHere's a bit more info from our field testing:
Spear.read_stream/3
instead of Spear.stream!/4
- this has resulted in approx 3x reduction of the average time to load 1 stream
read_stream
call, with a reasonable :max_count
, is necessary)@the-mikedavis, do you reckon it is possible to have similar performance characteristics between stream! and read_stream?
stream!
and read_stream
use essentially the same code path. The difference is that stream!
uses Stream.unfold/2
to repeatedly read from the server so that you can eventually read even very long streams. I suspect that you see lower total latency because stream!
is reading multiple times, and if individual requests have high latency then reading multiple times will have higher total latency. There is room for a small optimization in stream!
to stop requesting from the server when it reads a chunk smaller than the requested chunk size. That would eliminate one extra request most of the time. I don't see that as being a true solution here though, it's just working around the high latency problems of individual reads
@the-mikedavis Assuming that both C# and Elixir clients are using the same endpoint (https://developers.eventstore.com/clients/grpc/reading-events.html#reading-from-a-stream) I would expect them to have the same latency.
do you think the C# client reads in a bigger batch by default?
I believe the C# client reads with effectively no limit and it's set rather low in Spear. You can try tuning that option but I don't think it should have an effect since individual read_stream calls are slow. I suspect this might be solved by tuning some options in gen_tcp/ssl or changing the Mint.HTTP2.stream
call here to include the full request when possible. I should have some time later this week to test these ideas out
Sorry for the long delay! Things have been hectic lately.
I took a look at those packet captures and looking with a filter of http2.header
I'm seeing similar numbers between Elixir and C# between the request and response:
Client | Req (sec) | Res (sec) | Time waiting (msec) |
---|---|---|---|
C# | 0.015503 | 0.086402 | 70.899 |
Elixir | 1.059172 | 1.093208 | 34.036 |
In the C# case the read took twice as long on the server but I see that occasionally happening locally too - it looks like the server has some sort of caching to speed up repeated reads so some take half as much time.
It's still looking to me like this latency is coming from the server and isn't really dependent on the client. Could you post a small reproduction repository or gist of the C# code you used in your test? I don't do much dotnet development so that would make it much easier for me to compare
Sorry for the late reply, @the-mikedavis. Here's the dotnet code I've used for my timings: dotnet_esdb_timing.zip
It basically does something as simple as this:
var stream_name = "Document:134";
var settings = EventStoreClientSettings.Create("esdb://localhost:2113?tls=false");
var client = new EventStoreClient(settings);
var timer = new Stopwatch();
timer.Start();
var result = client.ReadStreamAsync(
Direction.Forwards,
stream_name,
StreamPosition.Start
);
await foreach (var @event in result) {
Console.WriteLine(@event.Event.EventType);
}
timer.Stop();
In the attached project this is actually done twice - the first run seems to be always quite slow (cold-start?).
I can reproduce the timing differences. With a similar Elixir script:
Mix.install [{:spear, path: "."}, {:jason, "~> 1.0"}]
{:ok, conn} = Spear.Connection.start_link(connection_string: "esdb://localhost:2113?tls=false")
stream_name = "Document:134"
start1 = :erlang.monotonic_time()
conn
|> Spear.stream!(stream_name, from: :start, direction: :forwards)
|> Enum.each(fn event ->
IO.puts event.type
end)
end1 = :erlang.monotonic_time()
start2 = :erlang.monotonic_time()
conn
|> Spear.stream!(stream_name, from: :start, direction: :forwards)
|> Enum.each(fn event ->
IO.puts event.type
end)
end2 = :erlang.monotonic_time()
IO.inspect(:erlang.convert_time_unit(end1 - start1, :native, :microsecond) / 1000, label: "read no. 1")
IO.inspect(:erlang.convert_time_unit(end2 - start2, :native, :microsecond) / 1000, label: "read no. 2")
I see ~90ms for the first read with dotnet and ~2-3ms for the subsequent read. With Spear I see ~130ms for the first read and ~80ms for the second.
I'll push up the change I mentioned earlier:
There is room for a small optimization in stream! to stop requesting from the server when it reads a chunk smaller than the requested chunk size. That would eliminate one extra request most of the time.
which improves the numbers a little: 80ms for the first read and ~25ms for the second.
I took some new packet captures comparing these and the HTTP/2 requests themselves seem very fast - for both Spear and dotnet I see the network request taking not more than 2ms for the second read. So the slow-down must be somewhere in either Mint, gpb or Spear.
I was curious about the chatty WINDOW_UPDATE frames so I took a look into Mint. Mint sends a WINDOW_UPDATE both for the request stream (i.e. the ReadReq) and for the whole connection. That's fine and fits the spec but it sends the WINDOW_UPDATE frames eagerly any time you handle a DATA frame in Mint.HTTP2.stream/2
. It can also be wasteful because you might receive a big message with a bunch of DATA frames for the same stream and then a DATA frame or the trailing HEADERs frame with the End Stream flag set. For that case I don't think there's a need to send the WINDOW_UPDATE for the stream, just the connection.
I changed Mint to collect all of the window updates in a single call of Mint.HTTP2.stream/2
(or recv) and send single frames per-stream and one for the connection level: https://github.com/the-mikedavis/mint/commit/c77a13ae07a64680e2a5ac8027c05fced41ac87f. This didn't have any noticeable effect though, nor did commenting out the code to send WINDOW_UPDATE frames entirely. (I thought this might work since the dotnet packet capture contains no WINDOW_UPDATEs.)
Feel free to give #93 a try. This is not a very high priority problem for me so I will probably not look at this again for a little while. If you'd like to dig deeper I can recommend checking out packet captures with Wireshark and creating flamegraphs. Nothing stuck out to me as odd but you can look for yourself. Set up a script that reads from a reasonably long stream, have it print out its PID with System.pid/0
and wait on IO.gets/1
, set export ERL_AFLAGS="+JPperf true"
, run the script and measure the perf data with perf record --pid '<pid of the script>' --freq 999 -g -- sleep 20
. Then you can view a flamegraph in something like https://github.com/KDAB/hotspot which can read perf.data
files.
It would also be interesting to try this with another HTTP/2 / GRPC client (for example gun, maybe https://github.com/elixir-grpc/grpc), protobuf definitions are here: https://github.com/EventStore/EventStore/blob/0af9906b39c9cd9fc6e301d5c4690b60b2fddcb1/src/Protos/Grpc/streams.proto. Or to eliminate protobuf (de)serialization altogether and try to just get the network request to be as fast as possible, to narrow down what exactly is slow.
Despite what I said above about putting this down for a while, I gave this another look today 😄
Here's a script of what I was thinking about above:
perf_test.exs
So we can time Mint and gpb independently. The results are a little surprising. This is against a stream with 20 events and a local eventstore container running without TLS:
$ elixir perf_test.exs
open connection: 12.558ms
ping/pong: 3.358ms
encode message: 18.89ms
ReadReq request: 12.586ms
ReadReq stream body: 0.013ms
ReadReq await responses: 11.335ms
decode response: 0.048ms
events: [
"event_store.client.streams.ReadResp": {:event,
{:"event_store.client.streams.ReadResp.ReadEvent", {...}, ...}},
"event_store.client.streams.ReadResp": {:event,
{:"event_store.client.streams.ReadResp.ReadEvent", ...}},
"event_store.client.streams.ReadResp": {:event, {...}},
"event_store.client.streams.ReadResp": {:event, ...},
"event_store.client.streams.ReadResp": {...},
...
]
encode message
(gpb's encode_msg/2
) takes quite a bit longer than I thought it would. The ReadReq request
(Mint.HTTP2.request/5
) and ReadReq await responses
(Mint.HTTP.stream/2
) are also fairly slow and could be looked into separately.
I wouldn't be opposed to dropping the dependency on gdb (which we use via event_store_db_gpb_protobufs
) in favor of something in-tree if it can boost performance reasonably well. We only handle a few slow-changing protobuf definitions from upstream so we don't necessarily need a general purpose protobuf encoder/decoder. I imagine that this would take a fair amount of work to change though.
@the-mikedavis, thank you for looking into that.
I'm quite surprised with the ReadReq request
and ReadReq await
performance. Surely that isn't because Mint is inherently slow, isn't it?
I took a look into the Mint.HTTP2.request/5
part and there are some easy wins to speed that up: https://github.com/elixir-mint/mint/pull/423
I also looked at splitting the timing up for receive/1
and Mint.HTTP.stream/2
:
perf_test.exs
Results:
open connection: 7.469ms
receive message: 0.0ms
Mint.HTTP.stream/2: 2.182ms
receive message: 0.0ms
Mint.HTTP.stream/2: 0.005ms
ping/pong: 2.277ms
encode message: 17.555ms
ReadReq request: 2.433ms
ReadReq stream body: 0.016ms
receive message: 20.523ms
Mint.HTTP.stream/2: 0.07ms
receive message: 0.0ms
Mint.HTTP.stream/2: 0.076ms
receive message: 0.0ms
Mint.HTTP.stream/2: 0.015ms
ReadReq await responses: 20.738ms
decode response: 0.047ms
events: [
"event_store.client.streams.ReadResp": {:event,
{:"event_store.client.streams.ReadResp.ReadEvent", {...}, ...}},
"event_store.client.streams.ReadResp": {:event,
{:"event_store.client.streams.ReadResp.ReadEvent", ...}},
"event_store.client.streams.ReadResp": {:event, {...}},
"event_store.client.streams.ReadResp": {:event, ...},
"event_store.client.streams.ReadResp": {...},
...
]
So it looks like the awaiting responses part is dominated by waiting for the message from the gen_tcp process. That would take a lot more digging to figure out but it does seem like a pretty long wait.
Ah I think I know what the issue is. Could you try setting TCP nodelay and see how that fares?
Spear.Connection.start_link(connection_string: "esdb://localhost:2113?tls=false", mint_opts: [transport_opts: [nodelay: true]])
We're sending pretty small packets for ReadReq and friends and the TCP socket might be buffering those for a while. I see this reduce the time of the second read (in this script above to ~1ms.
If you can reproduce, I would definitely consider setting this by default (within Spear) since we're sending somewhat small packets.
@the-mikedavis I think that might be it. Just saw https://brooker.co.za/blog/2024/05/09/nagle.html a couple days ago and thought about NODELAY
:)
Hi!
We're using Spear in production and are noticing that performance might be less than ideal. We're not sure where exactly the problem resides - Spear, EventStoreDB or the network.
Our flow is as follows:
SpearClient.stream!
)SpearClient.append
)Our
SpearClient
module is literally as simple as this:All this takes consistently approx 100-120 ms on production, and almost as much locally. Which seems rather slow.
When isolated to just the Spear calls (no logic of our own), we see the following on localhost:
SpearClient.stream!("stream_with_300_events") |> Enum.into([])
- between 40 and 60 msSpearClient.stream!("stream_with_20_events") |> Enum.into([])
- between 20 and 40 msIn production
append
takes typically a bit less thanstream! + Enum.into([]
, but still hovering around 50ms in majority of cases. As a side note, relatively simple queries to PostgresSQL (deployed on Google Cloud too) only take around 3-5 ms on average.Would it be possible to confirm that these timings are higher than expected? Anything else we could debug or try to speed it up?
Thanks in advance for any advise!