EventStore / EventStore-Client-Dotnet

Dotnet Client SDK for the Event Store gRPC Client API written in C#
Other
147 stars 38 forks source link

Feature idea: AppendToStreamOrResync API instead of WrongExpectedVersionException #312

Open bartelink opened 6 years ago

bartelink commented 6 years ago

See EventStore/EventStore#1626 for some context as to the sort of conflict resolution scenario I'm attempting to implement

TL;DR I have a conflict resolution loop in reaction to WrongVersionException, but I need to back off and/or read from the Master node to resolve the conflict, which I feel can be improved on

The ideal API would (without throwing) (in the event of a conflict) return the conflicting events, e.g. something like:

struct WriteOrConflictResult : WriteResult {
    /// `null` if no conflict
    public EventData[] ConflictingEvents { get; }
}
interface IEventStoreConnection {
   ...
   Task<WriteOrConflictResult> AppendToStreamOrResyncAsync(string stream, long expectedVersion, IEnumerable<EventData> events) 

The normal timeouts etc. would be enforced as usual, with the associated exceptions being thrown as they would for any other cases - the only real difference is that one would replace a

catch (WrongExpectedVersion)
    // read forwards from expectedVersion, making sure it's from the master node and/or backing off long enough for slave to have caught up
    // loop to retry

with

if (result.ConflictingEvents != null) 
     // use the conflicting events together with what you already knew to decide whether/how to resolve the conflict and/or retry and/or back off

DB-364

shaan1337 commented 6 years ago

This has been discussed with @bartelink and internally with the team. Any suggestions/PRs are welcome.

gregoryyoung commented 6 years ago

Is the goal here to remove the secondary read operation over the network to reduce latency etc? To be fair using the current API providing this behaviour with a network hit is pretty trivial (about 10 lines of code). The possible issue I see with this is how to define the return value in terms of protobufs (hint: I don't think its possible as a single operation).

Would just a new method that handled the making of multiple calls be good enough here?

bool AppendToStreamOrResync(event [] towrite, out event [] missing) or something similar?

bartelink commented 6 years ago

Aside from that I need the general WriteResult info, such an API would be fine (assuming an int expectedVersion param too). I guess one would call it TryAppendToStreamAsync

The central issue is that I'm on a DoNotRequireMaster connection, so doing a read myself using the same connection is not enough (if there was a "this time I absolutely need it to be from master" flag on a read API, that'd be fine too, but it seems wasteful to do a two-hop operation immediately after what has frequently already been a two-hop op already).

I'm not married to representing the ConflictingEvents as a nullable value; essentially it represents "whatever an equivalent Read call would yield" - the client side could map an empty array to returning true and null.

gregoryyoung commented 6 years ago

Why are you writing on a DoNotRequireMaster connection? There are very few reasons for wanting to do this and I am curious which one you are running into.

shaan1337 commented 6 years ago

@gregoryyoung The reason is a use case where there is a high read to write ratio using the same connection if I correctly understood @bartelink from previous conversations.

gregoryyoung commented 6 years ago

@shaan1337 I get that but you can also just open two connections. Its not like we will complain about having gasp 10 connections open. We have clients that run > 1000 connections to a node :)

bartelink commented 6 years ago

Yes - Shaan is correct; as mentioned in EventStore/EventStore#1626, this usage is very (too) read heavy, which is why it happens to be configured like this

My interim solution will indeed be to maintain two connections - for general client polling, a not-require-master connection does the job fine, and the master connection can then be used in the cases of a) writes b) resyncs like this.

I happen to have an abstraction which can do this under the covers neatly; I appreciate that such a scheme would not be good general advice.

shaan1337 commented 6 years ago

@gregoryyoung I agree (that's the approach @bartelink also suggested as a short term solution). Another reason for the proposed API call is also a matter of convenience for the developer to get the conflicting events in a single call not having to worry about handling the exception and re-reading (it could be more than one operation in the background)

gregoryyoung commented 6 years ago

@shaan1337 two connections is the solution....

let me explain a bit.

If you use a connect-to-any connection and you are writing to it, the node that receives it will be forwarding it to the master anyways. This is actually done internally so even if the client is not routable to the master the node it talks to obviously is (it has replication channels etc and will forward to the master). More on this below.

I am going to use my shitty ascii art drawing skills to try to explain...

      master
    /          \
  / .           \

slave 1 slave 2 ^ | client

If you write to slave 2 in this scenario you obviously take the network hit client->slave 2. However slave 2 will then route your request to master who will then write your request and forward it to the slaves waiting for n/2 acks. Once a quorum has been reached (here at least 1 slave acks which makes 2/3) it will then respond that the write was successful which will then return to slave 2 who will then return the result to you. This is why prefer-master exists btw it removes 2 network hops.

"I happen to have an abstraction which can do this under the covers neatly; I appreciate that such a scheme would not be good general advice."

Actually its recommended! Why not use the slaves for fast reads where a possible missing event is not an issue (eg it gets handled with expectedversion etc) and distribute the load? :+1:

"The central issue is that I'm on a DoNotRequireMaster connection, so doing a read myself using the same connection is not enough (if there was a "this time I absolutely need it to be from master" flag on a read API, that'd be fine too, but it seems wasteful to do a two-hop operation immediately after what has frequently already been a two-hop op already)."

You can see why its called a connection... Why not compose a master + a slave connection? You definitely want to write to the master anyways (see forwarding ^^^)... For reads prefer a slave but if it fails try the master (this should be rare but worth taking metrics on). Which node is master is exposed via the API, at any given point the only node ASSURED to have all data is the master in a cluster of 3 (atleast one other node has it but you don't know which one).

Does this make sense?

gregoryyoung commented 6 years ago

meh ascii art is hard to get right. it looked better here, I promise!

bartelink commented 6 years ago

OK, this all makes sense, thanks for taking the time to lay it out. But... (don't hit me!) I'm still interested in saving the hop if possible as an optimization though (when I'm managing a pair of connections, and the conflict is down to competing writers (e.g. for duelling writers and/or retries under load - assuming they are well behaved and will actually work in an idempotent fashion)).

Additionally, it enables a clean single connection mode for something that does only occasional writes and/or is not latency sensitive or various (admittedly rare) topographies where one would be less inclined to PerformOnMasterOnly for whatever reason ... and I don't need to do a catch!

Finally, IMO having such an API as a first class thing also helps for pedagogical purposes, i.e. being able to route around desires to ram stuff in with ExpectedVersion.Any etc.

gregoryyoung commented 6 years ago

" or various (admittedly rare) topographies where one would be less inclined to PerformOnMasterOnly"

As mentioned its going to happen on the master anyways or else it won't happen! The real question is who decides who the master is (client/server) when you write to a slave the slave will forward. The client also has this information via gossip but it might be a few seconds old. Where this really matters is when we have geographic distribution (say Africa/US/Australia).

"Finally, IMO having such an API as a first class thing also helps for pedagogical purposes, i.e. being able to route around desires to ram stuff in with ExpectedVersion.Any etc."

Thats why you get an ExpectedVersionException. I can not tell you until the time I look to put the events on disk whether or not there is a conflict. In most circumstance we have fine grained streams (as example a stream per order on a web retailer) and it is not an issue. For streams that write fast generally you should try to designate a single writer or give up on us handling consistency/de-dupe for you (ExpectedVersion.Any). on a side note dedupe also works with an sliding in-memory cache so if you are within that window it will still dedupe

"Finally, IMO having such an API as a first class thing also helps for pedagogical purposes, i.e. being able to route around desires to ram stuff in with ExpectedVersion.Any etc."

There is ... Read -> Write assuming you have multiple servers. If only a single server just cache and increment. This is also the most common use case ... read events->hydrate object->call behaviour->write events. You would be amazed how fast a server can be when its data is all cached in memory and as a bonus if you have 2 versions they can each cache their own understanding....

I know this probably does not answer all of your questions. Can you clarify a bit more?

gregoryyoung commented 6 years ago

As far as I know this is exactly what prefer master does (and I was one of the two people who built it, but I may be misunderstanding you), if it finds a master change it will shift the connection to the new master. What issues are you seeing using the option? If its not following the master properly this is something we should be looking at. Generally I prefer to write to the cluster as a whole (yes additional network hit but shitsgiven--) the prefer master option will connect to the master if possible and it follows gossip so if it moves it will move your connection.

I might be a bit dense but I am not sure what further you actually want beyond this?

shaan1337 commented 6 years ago

@gregoryyoung thanks it makes sense to maintain 2 connections. might it then be worthwhile to implement an inbuilt abstraction in the client API to maintain the 2 connections? One for reading from any slave and one for writing to the master? (and occasionally forcing reads to be done from the master if required in some cases) If it's a common pattern, it could be helpful.

Regarding saving one additional hop, I think @bartelink is referring to returning the conflicting event data in the Expected Version exception itself instead of having to do a read again. It seems that the easiest way to do it without introducing any breaking change would be to add a new message to the protocol.

bartelink commented 6 years ago

Yes, as Shaan says - the point is that in the case where you're not connected to the Master, there is no easy way to get the current state without random backoff and/or having to do a second connection. IOW instead of a "well I was just over there and there are 2 new ones, here's an Exception", it can say "I was just over there - here's what you need to catch up". In this single connection case and/or if you're not tracking the master, the only option you have otherwise is random backoff until the slave you're talking to catches up and/or a master switch is tracked via gossip. Having a clean solution can make a major difference to the P99 latencies for an overall read-write(read-toresync,rewrite)xN loop under load.

I agree that twin connections is a good solution for many cases, but it would be nice to have an alternative that does not involve that much busywork for cases where the extra throughput/efficiency is not worth that effort.

gregoryyoung commented 6 years ago

@shaan1337 this is as I am aware currently how it works.

@bartelink if you maintain a connection to a quorum of servers you should always have up to date information. Can you explain more the exact issues you run into with this? With any such system you will need more than one connection if you want to be reliable as well. If you prefer to give up reliability a subscription to a single node is feasible.

bartelink commented 6 years ago

TL;DR of the 'issues I run into' is that this is tweaking something that can already be made to work as it more directly models things I need to cater for with lots of concurrent requests when aiming for low latency on reads, writes and resyncs. Even though most of my answer is to move to twin connections, adding the proposed API would provide two new operating modes that I think are interesting: 1) single connection with PerformOnAnyNode - minimal code, enabling one to trade on average slower writes for cheap reads without having to write (or take the perf hit) of a retry loop with backoffs in the case of a WrongExpectedVersionException 2) optimal twin connections - optimal reads, writes and resyncs, no need to write an backoff loop for resync, and no latency/roundtrip hit either

TL;DR of the impl is that it's the same logic as AppendToStreamAsync but when it hits a conflict, it immediately does a read of the events since the expectedVersion and includes those in the payload that would ordinarily be the contents of the WrongExpectedVersionException, affording the client everything it needs to retry without requiring extra roundtrips, maintaining connections one might otherwise not, having to consider which connection to use, or backoffs.

When in doubt, draw a table ;)

Connections Reads Writes Conflict likelihood Resync Notes
PerformOnAnyNode (opening scenario / status quo) Can be stale, but kindest to cluster Will likely be be multi hop Higher exponential backoff loop Scales well if you can afford slower writes and slow resyncs
PerformOnAnyNode with proposed AppendOrResync (smallest increment that solves problem) Can be stale, but kindest to cluster Will likely be be multi hop Higher you already have it No extra connection maintenance, no exponential backoffs, min connections, cheap resyncs in event of conflict
PerformOnMasterOnly (deemed too expensive) Always read your writes ? Always one hop Low Easy - just read with same connection Predictable if you can afford it, default, and a good one
One of each (doing this, will work fine) Can choose between being nice or fully in sync Always use master, no forwarding Low Extra (single) roundtrip to read using master connection More code, more connections, nearly optimal
One of each with proposed AppendOrResync (still asking for this, ideal solution) Can choose between being nice or fully in sync Always use master, no forwarding Low you already have it More code, more connections, but optimal in all respects

@gregoryyoung Aside: can you confirm that a connection with PerformOnMasterOnly should always read its writes except in cases where there is a Master transition during the period?

gregoryyoung commented 6 years ago

@bartelink connected to master you should always be able to read your own writes

oskardudycz commented 2 years ago

I'll try to have a look at how Go and Rust clients are doing the handling. I think that's a more generic issue that we should solve across all gRPC clients. p.s. I doubt that we'll support that in TCP, but it might be a good "up for grabs" when we have a general conclusion.

bartelink commented 2 years ago

I agree with it being gRPC only - would love to see some solution for this as it removes a lot of messy workaround coding (having or connecting a separate connection if you happen to be transacting against a Follower node that's behind in a scenario where you need to provide a predictable/guaranteed request latency etc) that's necessary if you're not in a position to conclude "🙈 I'll simply use Leader connections for everything".

oskardudycz commented 2 years ago

I agree that this is a valid use case, we need to discuss that internally and see if/how we can support that.