Off-by-one error in consumer starting position?

eventide-project / consumer

Event Sourcing and Microservices Stack for Ruby

http://eventide-project.org

Other

1 stars 3 forks source link

Off-by-one error in consumer starting position? #10

Closed mimperatore closed 4 years ago

mimperatore commented 4 years ago

Hello! I've been exploring eventide, and have really been liking what I've seen to date.

I was playing around with https://github.com/eventide-examples/account-basics and it seems that when I kill and restart start_service.sh (after having produced some events with ruby script/produce_messages.rb), the consumer seems to reprocess messages already processed, even if I set the consumer's position_update_interval to 0.

Looking at the code, I was wondering whether there's an off-by-one error here: https://github.com/eventide-project/consumer/blob/66ce414a82842743da6c07e5e29bc5ee522aa819/lib/consumer/consumer.rb#L161

Should starting_position for the subscription be 1 more than what was last saved in the position store?

I was hoping the consumer could have a guarantee that it receives the same message exactly once. Perhaps Eventide can do this, but I haven't yet figured how to do it properly.

ntl commented 4 years ago

Hi! Thanks for stopping by.

I was hoping the consumer could have a guarantee that it receives the same message exactly once.

It's important to remember that the laws of distributed systems guarantee (among other things) that messages will sometimes get handled more than once. Although some message transports like Amazon SQS suggest they can do exactly-once delivery, their fine print is very clear: while they include mechanisms that can help prevent messages from being handled more than once, their prevention isn't airtight (nor could it be).

So, with Eventide, we build idempotence into our message handlers, so that they only process messages once. To do so, they must rely on the account's previously written events (like Deposited and Withdrawn) in order to determine if a command message has already been handled. This is done by projecting the events onto an entity, i.e. event sourcing.

You are correct that the Account Basics example indeed processes messages more than once; it's an example meant to show the very basics, and as a result, doesn't have idempotence (i.e. if it handles the same message twice, it writes two events, which is wrong). For a more fleshed out example that's close to production ready, see our Account Component. It's message handlers are idempotent. Warning: it's a fair bit more complex than the Basics example, as idempotence always is. Fortunately, when we work with evented, autonomous systems, there are only a few idempotence patterns that need to be learned, and they're all contained in that Account Component example.

We have instructional materials that cover all the idempotence techniques that are generally necessary with Eventide, but they are, at this point in time, only available as a 3-4 day training course that we (the project principals) put on. There is an effort underway to digitize them and allow for fully self directed learning, but unfortunately we aren't there yet. In the meantime, feel free to reach out to us in our Slack community.

Regarding the particular issue you raised, you are not seeing an off-by-one error in the consumer implementation. Our consumers can certainly be configured to record their latest position after every message they dispatch, but that isn't the intended use. Consumers only record their position so that they don't have to start from the very beginning of time when they get restarted. Since consumer handlers are expected to take care of idempotence on their own, operational consumers can safely reprocess a bit of backlog after restarting. The advantage of this approach is that there is not an extra write after every message, only after every few hundred or thousand (or whatever the interval is set to).

But don't try that with the Account Basics example -- as you noticed, it's handlers cannot determine if they've already seen a message they're handling.

Hope this helps!

mimperatore commented 4 years ago

@ntl, thanks for confirming that idempotent event handlers are necessary. The approach shown at https://github.com/eventide-examples/account-component/blob/master/lib/account_component/handlers/commands/transactions.rb#L30 is easy enough to do, and does indeed help in keeping the DB writes down.

ntl commented 4 years ago

Good stuff. Worth bearing in mind, that component also makes use of an idempotence key, which is used to handle cases where the same Deposit or Withdraw command message is actually written more than once (which is different than the case you described, where a restarting consumer handles a single message again).

mimperatore commented 4 years ago

Thanks for the extra tips...

component also makes use of an idempotence key

What do you mean by that? Can you link to the source code in the Account Component example that takes care of this?

cases where the same Deposit or Withdraw command message is actually written more than once

Is this for situations where one doesn't use optimistic concurrency control, or are you referring to other situations where commands get written multiple times?

sbellware commented 4 years ago

@mimperatore

Can you link to the source code in the Account Component example that takes care of this?

Any mention of gobal_position or sequence in the account component demonstrates the use of an idempotence key. The sequence attribute is an alias for global_position. The global_position can be used as an idempotence key for messages that use the message store as a queue (or transport).

The global_position attribute (and the sequence alias) is a member of the Messaging::Message::Metadata class: http://docs.eventide-project.org/user-guide/messages-and-message-data/metadata.html.

Is this for situations where one doesn't use optimistic concurrency control, or are you referring to other situations where commands get written multiple times?

It's not really a concurrency scenario. It happens when two services are interacting. Borrowing from Eventide's example components, for example, when Funds Transfer component collaborates with the Account component.

When Funds Transfer restarts, and hasn't heard back from Account about messages that are still in-progress, then Funds Transfer will re-send messages to Account.

In a previous message, @ntl said that the component makes use of an idempotence key. Specifically, it a handler that make use of the idempotence key, and they do so in order to determine wether a message should be processed or ignored because it's a duplicate.

There are two kinds of duplicate messages. One is recycled messages, which is a scenario of a component re-starting and re-processing some amount of messages to make sure that all messages had been processed when the component terminated. The other is re-issued messages, which is the scenario I described here where a collaborating component restarts and re-sends a previously-sent message. Note also that "sending" (or re-sending) a message in this context just means writing it to the message store.

Does that make sense? Let us know what needs to be discussed in greater detail to make more clear.

mimperatore commented 4 years ago

@sbellware Thanks for the additional clarifications.

When Funds Transfer restarts, and hasn't heard back from Account about messages that are still in-progress, then Funds Transfer will re-send messages to Account.

I'm wondering whether re-issued messages could be avoided? Given these are written to the message store, couldn't the re-issuing be avoided by seeing if they have already been written?

ntl commented 4 years ago

I'm wondering whether re-issued messages could be avoided? Given these are written to the message store, couldn't the re-issuing be avoided by seeing if they have already been written?

Such an approach could not offer any guarantees. A query that determines a message hasn't yet been written could be out-of-date by the time that the ensuing message gets written. The writing of event messages based on whether past events have (or have not yet) been written benefits from a necessary concurrency protection that would be lacking in FundsTransfer / Account scenario you're quoting.

From a broader user experience perspective, it would be unnatural to burden clients (as FundsTransfer is a client of Account) with the requirement to only send a message once. This is analogous to a web form that says, "do not double-click the Submit button, otherwise the form will get processed twice." i.e. in both cases, a burden gets foisted onto the user that should be taken care of behind the scenes. The laws of distributed systems already invalidate this approach, but it's worth also pointing out the perspective of the user, as well. The AccountComponent would be a less usable thing if all clients had to take extra (and technically insufficient) precautions to avoid sending it messages more than once.

mimperatore commented 4 years ago

I'm not suggesting a web-based client can/should avoid sending the request multiple times. As you mentioned, this is largely unavoidable. However (and forgive me that I don't fully grasp how FundsTransfer and Account work in your example, as I haven't had the time to digest all the code), I had assumed the FundsTransfer was a back-end service that could indeed determine whether the message had been written to the datastore already, through some sort of idempotency mechanism. However, I can't quite think of one myself, and from what I'm gathering from you, this isn't something you'd strive to do anyway because "the laws of distributed systems already invalidate this approach". I suppose once I become more familiar with such laws, I may have fewer questions such as these! :-). For now, thanks for sharing the additional thoughts & comments, @ntl.

sbellware commented 4 years ago

@mimperatore Funds Transfer does subscribes to Account's events. It does this to find out whether it (Funds Transfer) can proceed with next steps in the Funds Transfer process.

For example, Funds Transfer requests that Account perform a withdrawal. Account will get around to it sooner or later, and when Account does process that transaction, Account writes a Withdrawn event to its own stream.

Funds Transfer consumes the Account streams in order to detect when Account has done some work that Funds Transfer might be interested in. Funds Transfer is definitely interested in when Account processes deposits and withdrawals because Funds Transfer is a multi-step process that instructs Account to do some work, and then waits on a response from Account before proceeding.

But it takes time for the withdrawal request to go from start to finish: Being sent from Funds Transfer, landing in Account's input queue, making it to the head of the queue, finally being processed, and resulting in a Withdrawn event being written to Account's streams.

Funds Transfer will only know that the withdrawal message has been processed once account has processed it and written a Withdrawn event.

During the time that the withdrawal request is waiting in line in Account's input queue, the Funds Transfer component could restart. When it restarts, it reprocesses some number of messages in its own input queue. Some of those messages include instructions to start a funds transfer process. And each funds transfer process begins with sending a withdrawal request to Account.

Funds Transfer will not re-send withdrawal requests to Account for withdrawals that Account has already processed. But there are withdrawal requests in Account's queue that:

1-) Have already been send to Account 2-) Are already in Account's input queue 3-) But have not been processed by Account yet

For those messages that are still in Account's input queue and are awaiting processing, Funds Transfer has no idea whether it itself has sent the requests to Account. And therefor, Funds Transfer has no choice but to send the withdrawal request again.

Funds Transfer doesn't know if it's the first time it has sent the withdrawal request to Account, or the tenth time it has sent the request to Account. And Funds Transfer can't know. The only means that Funds Transfer has to determine whether Account has already processed a withdrawal request is when Account has completed processing the request and has committed a Withdrawn event to disk. But in the mean time, while messages are in-process, there's no way to know.

sbellware commented 4 years ago

PS: This pattern will exist in any system where two components are coordinating on some work. It's not just a side effect specifically of the Funds Transfer and Account coordination. It's just a common property of message-based, distributed systems.