dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
9.94k stars 2.02k forks source link

Single Writer Multiple Readers Grain #447

Open nehmebilal opened 9 years ago

nehmebilal commented 9 years ago

We are currently working on adding support for the single writer multiple readers (SWMR) scenario in OrleansTemplates.

The grain interface code looks like the following:

    [SingleWriterMultipleReaders(ReadReplicaCount = 10)]
    public interface IHelloGrain : IGrainWithStringKey
    {
        [ReadOnly]
        Task<string> ReadSomething();

        Task WriteSomething(string something);
    }

Note: The following is out of date. This feature has been implemented and is now part of OrleansTemplates. Take a look at the code generation documentation for more information about the SWMR pattern.

Our code generation (not done yet) will create an IHelloGrainReader and an IHelloGrainWriter. As you can guess, the IHelloGrainReader exposes the read interface and the IHelloGrainWriter exposes the write interface. To benefit from the SWMR scaling, client code must use the IHelloGrainReader and IHelloGrainWriter but not the IHelloGrain directly.

The IHelloGrainReader is a stateless worker that simply forwards read calls to read replicas. It uses a topology (consistent hash) to distribute the loads among read replicas. Because it's a stateless worker, it automatically scales.

The IHelloGrainWriter is a normal Grain (not a stateless worker). When it receives a write request, it forwards the call to all read replicas. We chose to execute the same write request on all read replicas rather than executing it once and then replicating the State. Please let us know if you have any concerns with that. One assumption we made here is that write requests are deterministic (i.e. they produce the same results when executed on multiple replicas).

To ensure that a client reads his own writes and to speed-up write requests, we require client code to pass in a session id. The IHelloGrainReader and IHelloGrainWriter look as follows:

    public interface IHelloGrainReader
    {
        Task<string> ReadSomething(string sessionId);
    }

    public interface IHelloGrainWriter
    {
        Task WriteSomething(string sessionId, string something);
    }

When the IHelloGrainWriter receives a write request, it initiates the write request on all read replicas but only awaits the execution of the request on the replica corresponding to the given session Id. The topology guarantees that we always hit the same replica for a given session id.

The topology is immutable, it doesn't change (until we implement auto-scaling), and each activation of IHelloGrainReader and IHelloGrainWriter has a copy of it.

We did some initial testing and the speed-up was almost linear for randomly generated session ids. We of course assume that clients are going to use different session ids.

We would love to hear your feedback so we can adjust the design, API, implementation, etc.

veikkoeeva commented 9 years ago

I was thinking to dig deeper into this, but IRL got into way. Maybe a dumb question (but to get things going), would implementing this scenario like this benefit internally in Orleans codebase the scenario of multiple StatelessWorker stream readers as in the linked issues?

nehmebilal commented 9 years ago

I would say it's the other way around: being able to subscribe to streams from StatelessWorker would (potentially) make the implementation of Single Writer Multiple Readers easier.

gabikliot commented 9 years ago

I don't see any connection between StatelessWorker and replication. Even if StatelessWorker will be able to subscribe to a stream , I don't see how this could be used to help implement the replication. This is since there is not way to broadcast a msg to all StatelessWorkers.

And I also don't see having replication can help implement StatelessWorker consuming from a stream.

I think those 2 - StatelessWorker replication - and are completely unrelated.

nehmebilal commented 9 years ago

I don't see how this could be used to help implement the replication. This is since there is not way to broadcast a msg to all StatelessWorkers.

Wouldn't be enough to publish the message to the stream and then all subscribers will receive it?

I am not familiar with streams yet and maybe it's better not to merge the other discussion into this. @veikkoeeva is trying to trap us :)

Let's keep this discussion about the API proposed here and what is possible today with Orleans.

gabikliot commented 9 years ago

Wouldn't be enough to publish the message to the stream and then all subscribers will receive it?

Stream subscription is per grain, not per activation. Therefore, every msg published to a stream will be received by one activation of every grain that is subscribed, not by multiple activations.

I agree about not mixing up streams, stateless workers and this discussion about what is possible today with Orleans w.r.t. replication.

veikkoeeva commented 9 years ago

Ack. I'll refrain bringing this up until I make myself a bit more time to pick up the concepts better. :)

nehmebilal commented 9 years ago

We ended up implementing a slightly different version of the pattern (see here) and we published it to OrleansTemplates.

Please feel free to let us know what you think if you get the chance to try it out.

yevhen commented 9 years ago

@nehmebilal hmm, interesting implementation. Have you tried to measure the difference with just having a single grain which allows only read methods to interleave (by using [AlwaysInterleave] attribute)?

nehmebilal commented 9 years ago

@yevhen , I actually haven't even heard about [AlwaysInterleave]! Is it described anywhere in the docs?

yevhen commented 9 years ago

@nehmebilal not sure if it is documented. Which is a pity, of course.

You can see it in action in Orleankka's sample here. Git clone repo, run Nake.bat from cmd line. Open solution, run sample :smile:

[AlwaysInterleave] is a way to make only certain (marked) methods reentrant. In Orleankka there is no methods but messages, so you allow certain message to interleave by specifying its type.

[Reentrant(Type)] is an alias I'm using in Orleankka.

nehmebilal commented 9 years ago

@yevhen Thanks for the info! I'll look into it.

gabikliot commented 9 years ago

[AlwaysInterleave] was intended for internal runtime usage. We added it to solve some reentrancy issues in Orleans streaming, that could not have been solved otherwise (I can provide more details if someone is interested).

[AlwaysInterleave] is not really suitable to implement read-only method on a grain that also has write methods. [AlwaysInterleave] will interleave read methods with the write method as well, which is not the semantics write methods expect (they expect an exclusive access to the grain).

A better approach, in my opinion, is to write TPL utility/helper functions that will provide any interleaving pattern you want. For example, take a look at AsyncSerialExecutor here: https://github.com/dotnet/orleans/blob/master/src/Orleans/Async/AsyncSerialExecutor.cs It can be used to provide non-reentrant execution for reentrant grains. Similarly, one can “easily” build any other pattern of reentrancy/interleaving himself from TPL (interleave that, don't interleave this). In that regard, TPL is simply awesome and super powerful. Writing asynchronous, multithreaded abstractions with it is very easy, since it composes so well. In Orleankka case, I think this even makes more sense, since this impl. will be more portable and is anyway hidden inside the Orleankka layer.

Regardless, seeing the comparison of perf. between this approach (multiple distributed grains) and just a regular single grain marked as Reentrant would be very interesting. You might be surprised how much a single Reentrant grain can handle.

yevhen commented 9 years ago

@gabikliot

[AlwaysInterleave] will interleave read methods with the write method as well, which is not the semantics write methods expect (they expect an exclusive access to the grain)

Nope, that's exactly what is expected from Single Writer (Multiple Readers) pattern. You can read Martin Thompson's (of LMAX fame) post on the subject here. Excerpt:

Now, what if you could design a system whereby any item of data, or resource, is only mutated by a single writer/thread? ... It is OK if multiple threads, or other execution contexts, read the same data.

Now regarding performance. @gabikliot @sergeybykov I would really appreciate if you review my code here, which tests (shows) Single Writer in action, since that what I present to people and I don't want to confuse them with code which doesn't actually work (ye, async is hard). Heck, you can even run it :smile:

If it's ok, I believe it could be used as the basis for benchmark :grimacing:

Regardless, seeing the comparison of perf. between this approach (multiple distributed grains) and just a regular single grain marked as Reentrant would be very interesting. You might be surprised how much a single Reentrant grain can handle.

Spoiler: In 2 seconds, I did 2K reads while serving 2 slow writes, and that was ran in debug mode with super-slow .NET binary serialization (so switching to release mode and native serialization might boost perf even more). Actually, I was expecting this from such great technology as Orleans. So no surprises for me here :smirk:

gabikliot commented 9 years ago

Regarding write read semantics: usually, what people mean by those is that while write executes it has an exclusive access to the data, while multiple readers can execute in parallel on multiple threads. So readers can interleave with other readers, but not with writers. This is the semantic of write-read lock for example.

In Orleans, I think we want the same. Otherwise, what you can have is the following anomaly: 1) Read starts and reads data X, does some async call where logic depends on X. 2) Writer comes in, mutates X and maybe other variable Y. 3) Read resumes, thinks X is unchanged and so is Y and does something now based on Y, 4) Since Y was changed the logic in the reader is wrong.

This happens since the reader expected to execute against an immutable snapshot/copy of the state and this was violated by the concurrent write. I think the quote you provided does not contradict what I said. In what I described “multiple threads, or other execution contexts, CAN INDEED read the same data”. They just don’t do it concurrently with writes.

As for perf: thanks for confirming! The numbers look in the same ballpark with what we expect. We usually quote an unofficial, but expected lower bound of 1000 calls per Reentrant grain per second (this is when the rest of the system is not under load. Of course, when you have 1000s of grains doing the same, you will be limited by silo throughout). This is a lower bound. We are usually able to do even more, but same order of magnitude.

yevhen commented 9 years ago

Sorry, but I don't agree with you.

In the scenario that you're describing 2 reads are actually done, not one. In this case second read can not assume anything about state staying unchanged in between.

Also, my understanding of semantics, is that reads see (and treat) state as the whole value, not just parts. So in your scenario, reads are supposed to read state fully, to be able to make correct decisions. Otherwise your definition below:

reader expected to execute against an immutable snapshot/copy of the state

... doesn't make sense, since you copy/replicate state also as a coherent, fully consistent whole.

I don't understand how copying (replicating) just X, without Y can help me, as a reader, to proceed with my task, if it requires both X and Y, and also requires them to be mutually consistent. As a reader, I don't even view them as X and Y - I do view them as {X, Y}.

gabikliot commented 9 years ago

In the scenario that you're describing 2 reads are actually done

No. In the way I described it, there is one read grain method invocation. The local vars are read multiple times.

The logical snapshot (it does not have to be a physically copy) of the grain state includes both X and Y. It includes all grain variables.

If your grain method does not have async calls in them (like in Akka) then there is no problem, since interleaving of grain turns (partial executions of the grain method) are not possible, since there is only one turn (like in Akka, only one turn is possible). But in a regular vanilla Orleans multiple turns are possible of course.


Anyway, I don't see a point arguing about neither of those. I think the current Orleans semantics, with the current Orleans method invocation semantics and state semantics, are very clear and what I wrote above describes that. If Orleanka has different semantics, it is totally fine. No problem with that. All I wanted to draw attention to is that using [AlwaysInterleave] in Orleans for read write grains is wrong. It may be OK in Orleanka and that is totally fine with me.

yevhen commented 9 years ago

We probably do have different understanding of what to consider as read (and what/who consider as reader).

I view everything in terms of Command-Query Separation (CQS) by Bertran Meyer. So that I view reads as being pure queries. They do not produce any observable side-effects and they are repeatable. The scenario which you describes above as:

3) Read resumes, thinks X is unchanged and so is Y and does something now based on Y, 4) Since Y was changed the logic in the reader is wrong.

Means, reader produce some side effect (otherwise it can't be wrong :smile:) while it's inside a resource, which from outside is observed as mutation. That's not a definition of pure query.

It's completely ok for the reader to produce side-effects based on resource state, but that happens outside of resource. So reader is grabbing some state from resource and then do something based on what he got.

Reading state means data shipping, while what you describing is actually function shipping, which is usually done for the purposes of more efficient mutation of state (data locality).

So no, I don't agree that this is somehow peculiar to Orleankka, Orleans or Akka. It's rather a modeling issue or a matter of having a wrong (different?) perspective :smile:

yevhen commented 9 years ago

The reader is someone on the outside. The beholder :smile:

gabikliot commented 9 years ago

Yes, it's a modeling question. We model what it means to read a state differently. I am referring to the general definition of a replicated in memory statefull service, like here: https://en.wikipedia.org/wiki/State_machine_replication. Or the general notion of Database serializability. My reads definitely can produce observable side effects.

And so do the reads in @nehmebilal implementation, at least as far as I understood.

I never said peculiar. I am also not saying which one is better/correct. They are just different.

When Orleans supports Event Sourcing (hopefully soon), there will be a place to talk about different read state models. But as of now, Orleans itself supports only the "State machine replication" model.

nehmebilal commented 9 years ago

[AlwaysInterleave] will interleave read methods with the write method as well, which is not the semantics write methods expect (they expect an exclusive access to the grain)

Ok let's say I have the following in my Grain:

public Task SetPosition(int x, int y)
{
        State.x = x;
        State.y = y;
                return TaskDone.Done;
}

[AlwaysInterleave]
public Task<KeyValuePair<int,int>> GetPosition()
{
        return Task.FromResult(new KeyValuePair(State.x, State.y));
}

In this case the write is not atomic and If write and read calls are interleaved, would that mean that it's possible that GetPosition() will return a partially updated position? (x has been updated but not y yet)

sergeybykov commented 9 years ago

This is safe. You are only potentially in trouble if in between State.x = x;and State.x = x; you do await foo.Bar(); and hence implicitly release the thread between the two assignments.

nehmebilal commented 9 years ago

Which is a valid use case; a write method can involve one or moreawait's. I guess that means that using [AlwaysInterleave] involves some understanding of the interleaving mechanics and add some constraints on what you can do in a write method.

gabikliot commented 9 years ago

Exactly. As I said, I would refrain from using [AlwaysInterleave] at all in applications. It was added for runtime usage. And using it for write-read semantics will create confusion. Instead, if you want to do exclusive writer shared readers pattern on a single activation (not distributed, like you did in the Tepmplates), I recommend using Reentrant attribute and TPL based wrapper functions, like I described above. I can provide more details on how this can be easy done.

nehmebilal commented 9 years ago

Make sense. The purpose here was distributed readers but I agree that comparing the two approaches on the non-distributed case would be beneficial. I'll try to find time for it but if anyone else has the bandwidth to do it, feel free!

yevhen commented 9 years ago

@nehmebilal

Which is a valid use case; a write method can involve one or moreawait's. I guess that means that using [AlwaysInterleave] involves some understanding of the interleaving mechanics and add some constraints on what you can do in a write method.

Yes. In my case, in event-sourced grain, the state is updated only after await and never partially. While mutator method can have awaits (and it's fine) it updates state only after the latest one.

I don't know, but all the scary things, guys, that you're telling here, like scenario given by @gabikliot and the one given by @sergeybykov - are just bad design (and implementation) to me.

I do understand, that you guys optimize for a Joe, the developer, so he will not shoot himself in a foot, but that comes with the price - the stuff starts being more inconvenient to use, or it comes with the drop in performance :imp:

Why not just document [AlwaysInterleave]? Why it is ok to have [Reentrant] to be so thoroughly documented, and by the way, this feature is a best way for Joe to shoot himself in a foot (and also knees, arms, chest and head, since shots may interleave in unpredictable way), but shy away from documenting and talking about [AlwaysInterleave]?

I view [Reentrant] as just a shortcut. It is semantically equal to just marking every method with [AlwaysInterleave]. So, if all your methods may interleave - you can just put [Reentrant] on a class instead of doing a mundane task of attributing each and every method.

Anyway, having an approach to control interleaving in a more granular way is incredibly useful, and [AlwaysInterleave] is a great stuff. I'm using it extensively. I depend on it. When I show it to people (who are also well versed with ES\CQS\DDD) it makes them so excited, they are almost start crying - that's how they want and need it! :grinning:

As always, my 2c (which nobody really care about) and I off of this discussion :smile:

nehmebilal commented 9 years ago

@yevhen

As always, my 2c (which nobody really care about) and I off of this discussion :smile:

That's a lie :smiley:

Why not just document [AlwaysInterleave]? Why it is ok to have [Reentrant] to be so thoroughly documented, and by the way, this feature is a best way for Joe to shoot himself in a foot (and also knees, arms, chest and head, since shots may interleave in unpredictable way), but shy away from documenting and talking about [AlwaysInterleave]?

I view [Reentrant] as just a shortcut. It is semantically equal to just marking every method with [AlwaysInterleave]. So, if all your methods may interleave - you can just put [Reentrant] on a class instead of doing a mundane task of attributing each and every method.

That sounds reasonable to me! Maybe we should undocument [Reentrant]? :smile:

I think [Reentrant] is needed as a solution to handle dead-locks (grains are waiting on each other) but [AlwaysInterleave] sounds like a better solution in this case: Instead of marking all methods as [AlwaysInterleave] (i.e. Grain [Reentrant]) why not just mark the one that involves the dead-lock [AlwaysInterleave]?

I realize that this is not related to SWMR, @yevhen maybe open a dedicated issue?

Anyway, having an approach to control interleaving in a more granular way is incredibly useful, and [AlwaysInterleave] is a great stuff. I'm using it extensively. I depend on it. When I show it to people (who are also well versed with ES\CQS\DDD) it makes them so excited, they are almost start crying - that's how they want and need it! :grinning:

Hmm, as far as I know, interleaving doesn't mean parallel execution; what is making them so excited?

sergeybykov commented 9 years ago

We used to have a [ReadOnly] attribute that one could put on a grain method. It would allow that method to interleave with other ReadOnly methods on the grain. We removed it because it didn't actually worked after a major overhaul of the scheduler/dispatcher, but it's of course possible to bring it back.

Is that what we actually want here instead of making [AlwaysInterleave] public? Would that be sufficient for all the scenarios discussed above but safer than [AlwaysInterleave]?

yevhen commented 9 years ago

@sergeybykov no, looks like you still don't understand the use-case. In fact, reads which may interleave with writes, is exactly what I (and many other people) need. With [ReadOnly] which may interleave only with other [ReadOnly] my test here will not work. I bet you haven't even tried to read it, despite I posted links to it already 3 times in this discussion ;)

Simple use-case: ebay. I'm selling iPhone7. I have 3000 devices in-stock. I want to durably store reservation (what user, and how much qty bought) and also decrease the number of devices in-stock after every order. The number of devices still in-stock is shown in the UI, so when you load/reload page you will always see the latest number. It will be already stale at the moment we render it on UI, but we have a requirement to show the latest value and it's just "best effort".

Now, each write (reservation) can take around 50 ms. So I can take max 20 reservations per/sec. Which is more than ok as usually no more than 3-5 reservations at second are being made. I have plenty of headroom.

When you go to the cloud, your write latency can be 3-10 times slower. This is especially true with Azure. I can write sequentially at max 2-5 batches per second in a single WATS partition. That makes for a 200-400 ms of latency. So with these numbers I can now take 2-5 reservations at second, Which is still ok but nevertheless - freaking slow (it's a cloud, baby :).

Unfortunately, the auction is being viewed a lot more than the number of reservations made. Let's say it's an order (or 2) of magnitude more times. Like 200 parallel views per second.

Now the question: how many views (reads) of the item qty I'll be able to serve using proposed [ReadOnly] attribute, taking into account that writes are also happening at the rate and with latency specified above?

yevhen commented 9 years ago

I don't want writes to interleave with other writes. But reads interleaving with writes - is ok.

sergeybykov commented 9 years ago

@yevhen Sorry, I'm trying to catch up with multiple threads of activity after I suddenly got two out of school children to take care of, while also trying to prepare slides for a bunch of academics tomorrow at the Faculty Summit. :-) Thanks for explaining the gist of the issue to me.

So in this scenario would something like [ReadOnly(bool InterleaveWithWrites=flase)] be sufficient? By default it would mean only interleave with ReadOnly methods, but you can override the default behavior and also interleave with writes by passing true for InterleaveWithWrites?

nehmebilal commented 9 years ago

@yevhen Interesting problem!

Just to make sure I understand correctly:

Yes. In my case, in event-sourced grain, the state is updated only after await and never partially. While mutator method can have awaits (and it's fine) it updates state only after the latest one.

This means that in your case updating the state is never interleaved with anything else (atomic)? and you're not just talking about persisting the state but also updating the state in memory is atomic (from interleaving perspective)?

yevhen commented 8 years ago

@nehmebilal yes, in memory update is atomic. There is no awaits between state updates.

During command execution, the side-effect is only calculated but applied in memory only after being successfully committed to storage (after await returned).

See how domain code looks here. Event is a computed side-effect. So the write is two-phased:

  1. Compute side-effect.
  2. Store it durably (append to log).
  3. Apply to in-memory state
  4. Return

But writes cannot be interleaved with other writes, only with reads.

P.S. I propose you to spend 10 mins playing with event-sourcing example to get better understanding of the paradigm.

yevhen commented 8 years ago

Another interesting property of this model is failure management.

@gabikliot wrote in #61 here:

In that new option we could also make it even more managed, for example by automatically rolling back the state in case of failures.

With event-sourcing, you don't need to make a snapshot of in-memory state before attempting any write operation, which means faster perf, less copying. Not so much with default 'in-place update & then flush' model, which is also default in Orleans (Grain<TState>).

No surprises that #343 is still the most wanted and commented issue.

gabikliot commented 8 years ago

I don't understand why are we discussing it here again. We already totally agreed that Event Sourcing in Orleans is a great additional option and we are actively (although slowly, due to other priority things) working towards that. It will allow an alternative programming model related to state management. It won't be exclusive, but additional to what we have now.

But as of now, in the current code base, there is still no ES and you proposed to use AlwaysInterleave and all I said is that in the current code base, the way we currently manage state, it will not work.

nehmebilal commented 8 years ago

@gabikliot: But as of now, in the current code base, there is still no ES and you proposed to use AlwaysInterleave and all I said is that in the current code base, the way we currently manage state, it will not work.

True but isn't true that [Reentrant] is in the same boat?

yevhen commented 8 years ago

@gabikliot hey, no worries. I've just enumerated some of the qualities of ES which are interesting in a context of this discussion. Referenced you just for an example. No offense, please :smiley:

nehmebilal commented 8 years ago

@yevhen

P.S. I propose you to spend 10 mins playing with event-sourcing example to get better understanding of the paradigm.

Will do.

gabikliot commented 8 years ago

@nehmebilal , what do you mean Reentrant is in the same boat? Reentrant can not be used, as is, to do single writer multiple readers pattern, just like AlwaysInterleave cannot. Reentrant with additional helper TPL wrappers can.

gabikliot commented 8 years ago

@yevhen , there is absolutely no offense. You commented on how one can use AlwaysInterlave in Orleans. I replied that one cannot use it, in the current code now, to do single writer multi reader. I had to clarify that, so not to confuse potential future readers on what works and how. Discussion about virtues of Event Sourcing is interesting and I am actually already convinced that it will be a great addition. But it is unrelated to using AlwaysInterleave in the current code base.

nehmebilal commented 8 years ago

@gabikliot

what do you mean Reentrant is in the same boat? Reentrant can not be used, as is, to do single writer multiple readers pattern, just like AlwaysInterleave cannot. Reentrant with additional helper TPL wrappers can.

Yes but nevertheless, [Reentrant] can be used and is documented. The point is that [Reentrant] makes all methods [AlwaysInterleave] which increases the risk of partial state updates. If I have a method that needs to interleave with other methods (for example to avoid dead-locks), isn't safer to use [AlwaysInterleave] on only that method rather than making all the Grain reentrant?

On thing to mention about both [Reentrant] and [AlwaysInterleave] with respect to SWMR, neither allow read calls to be executed in parallel.

yevhen commented 8 years ago

@nehmebilal

On thing to mention about both [Reentrant] and [AlwaysInterleave] with respect to SWMR, neither allow read calls to be executed in parallel.

Yes! So the model with replication potentially could be more scalable.

gabikliot commented 8 years ago

Yes, of course Reentrant can be used! Not only it can be used, it is used pretty widely in a number of scenario we are aware where you need better per grain perf. I just said that it can not be used as is for single writer multi reader. But for general interleaved execution it is super useful and that is why it is documented well.

Now to the question about general usability of AlwaysInterleave. Indeed, it allowes more general partial interleaving patterns and not just all or nothing like Reentrant. The reason we did not document it is since we thought it will be a bit too confusing - people can grasp if all is not reentrant or all is reentrant, but that part is and that pat is not is more difficult to grasp. But we might be wrong about that. Maybe it will not be complicated for people and maybe we should document and promote it more. I personally do not have strong opinion in either direction about that.

sergeybykov commented 8 years ago

@nehmebilal @yevhen So would a method level [ReadOnly(bool InterleaveWithWrites=false)] attribute be sufficient in both of your cases to express the desire to interleave read-only methods either 1) between themselves or 2) with other ('write') methods?

Would we also need a method level [Reentrant] attribute to mark methods that can interleave with any other method?

I'm trying to narrow down the programming model aspects of this.

nehmebilal commented 8 years ago

If we're going that route, why not make it even more customizable? I am thinking interleaving groups. In the following you have two interleaving groups "R" and "W" but you can have as many as you want.

[Interleave("R")]
Task Method1();

[Interleave("W")]
Task Method2();

[Interleave("R", "W")]
Task Method3();

[Interleave("R")]
Task Method4();

Task Method5();

That way, if let's say Method3 is reading a part of the State that Method2 is not modifying, the two can be interleaved.

nehmebilal commented 8 years ago

Of course this add some complexity but well if you as a user don't want the complexity, just use one group for read methods.

sergeybykov commented 8 years ago

This is an interesting way to generalize the read/write method approach. I think we would need an intuitive default. Would it make sense for [Interleave] with no arguments to mean "interleave with any interleaving group" or "interleave with the default (no argument) group"? It seems to me the latter is safer.

Now, to reduce the cognitive load, shouldn't we reuse [Reentrant] instead of introducing [Interleave]? Expanding [Reentrant] to the method level seems more logical to me.

nehmebilal commented 8 years ago

Would it make sense for [Interleave] with no arguments to mean "interleave with any interleaving group" or "interleave with the default (no argument) group"? It seems to me the latter is safer.

Yep latter sounds safer and more explicit.

Now, to reduce the cognitive load, shouldn't we reuse [Reentrant] instead of introducing [Interleave]? Expanding [Reentrant] to the method level seems more logical to me.

:+1:

jdom commented 8 years ago

Haven't read the entire thread, but a quick scan didn't seem to find any follow up on this comment:

One assumption we made here is that write requests are deterministic (i.e. they produce the same results when executed on multiple replicas).

I think that assumption is not always valid... deterministic is not the same as idempotent... the write operation might go and perform some non-idempotent change in some external service that might not work fine with the other replicas repeating on their write operation. Or even something as basic as using the current machine's environment (such as the clock) is not idempotent nor deterministic...

nehmebilal commented 8 years ago

@jdom You're right. The description was actually out of date (I added a note in the issue). If you read few comments down from the top you'll find that the algorithm has been modified and is now described here.

Plasma commented 8 years ago

This design (get/set state) seems robust and simple; but there is the caveat that if the state is quite large (eg many megabytes?) then partial updates to the state may be very slow as the entire state is replicated each time.