NFIBrokerage / spear

A sharp EventStoreDB v20+ client backed by Mint :yum:
https://hex.pm/packages/spear
Apache License 2.0
85 stars 14 forks source link

persistent subscription ack failure when writing many events on same connection #87

Closed dvic closed 1 year ago

dvic commented 1 year ago

We've encountered an issue where persistent subscriptions would experience slowdowns because it seems like acknowledgments are not being processed (message would be retried). We've been able to reproduce it in [1]. The fix for us has been to use a separate connection for writing and for connecting to the persistent subscription. Not sure if this is expected behaviour (or an appropriate fix). Is this something that's a known issue?

[1] script demonstrating issue: https://gist.github.com/dvic/a2177efd0d95fc6ff755b2650928cbca (setting separate_clients = true) fixes the issue

the-mikedavis commented 1 year ago

Thanks for the reproduction case! I believe the acks are exhausting the connection-level window. (Ack messages are so small it doesn't seem likely they would ever exhaust the request-level window.)

https://github.com/NFIBrokerage/spear/blob/e923d074f4132d07df9ba59fcf4ff158590bf0af/lib/spear/connection.ex#L211-L229

We're pushing the message directly without checking window sizes. That can fail with %Mint.HTTPError{reason: {:exceeds_window_size, ...}, ...} and when that fails, we silently ignore the failure (see the {:error, conn, reason} branch of that with). Acks are sent as GenServer casts under the hood so the ack is silently discarded. Publishing many events on the same connection should be perfectly fine. In this case though it's helping exhaust the connection's window.

I should have some time in the next few days to fix this and publish a patch release

the-mikedavis commented 1 year ago

I pushed up a change in https://github.com/NFIBrokerage/spear/commit/f34baac2ee3c8a848f0c78b9f4e39502ca8d1e19 that resolves this for me locally. I'll test it out a bit more and if it all looks good I'll make a release soon

dvic commented 1 year ago

I pushed up a change in f34baac that resolves this for me locally. I'll test it out a bit more and if it all looks good I'll make a release soon

Awesome! Thanks for the quick fix 👍!

the-mikedavis commented 1 year ago

Ok f34baac is now released in v1.3.1. Thanks again for the reproduction case and report!