cofoundry-cms / cofoundry

Cofoundry is an extensible and flexible .NET Core CMS & application framework focusing on code first development
https://www.cofoundry.org
MIT License
835 stars 146 forks source link

Durable messages / transaction handling #418

Open JornWildt opened 3 years ago

JornWildt commented 3 years ago

Sorry for spamming you with ideas and comments.

I was looking at this piece of code from https://www.cofoundry.org/docs/framework/data-access/transactions:

    public async Task ExecuteAsync(DeletePageCommand command, IExecutionContext executionContext)
    {
        ...

        using (var scope = _transactionScopeFactory.Create(_dbContext))
        {
            ...
            scope.QueueCompletionTask(() => OnTransactionComplete(command));

            await scope.CompleteAsync();
        }
    }

    private Task OnTransactionComplete(DeletePageCommand command)
    {
        _pageCache.Clear(command.PageId);

        return _messageAggregator.PublishAsync(new PageDeletedMessage()
        {
            PageId = command.PageId
        });
    }

Now, what happens if the message handler fails? Lets say I'm building an index based on messages like the one above - or deleting related stuff when a page/custom entity/something else is deleted - when the message handler is invoked, the core operation has completed and commited, so if my message handler fails, I loose an update or delete, and the external index gets out of sync.

I would rather have synchronous message handlers like these be handled inside the same transactional boundary as the core operation - in which case both the core operation and the message handler either fails or commits together.

I do realize the issue you have with cashes - you don't want the cashes to be modified if the transaction commit fails after updating the cache, since caches usually doesn't share the same transactional behavior (you cannot easily rollback a change to an in-memory cache).

JornWildt commented 3 years ago

The solution for this problem, I believe, is to use "durable messages" stored in persisted queues. In this case, publishing a message is a transactional operation bound to the same transaction as the main operation. Message handlers are then invoked asynchronously - and if they fail, they will be retried by the message queueing logic.

Caches can then be updated as today - after the main operation has completed and commited - using the existing synchronous message publishing system, and secondary systems with other transactional boundaries can be handled asynchronously later on.

JornWildt commented 3 years ago

Where I work we use Rebus (https://github.com/rebus-org/Rebus) for exactly these scenarios.

But there are unfortunately still some issues as the Rebus message queue runs in its own transaction - and then, again, we run into the issue of the core operation completing and committing and for some odd reason, Rebus failing to commit it's messages. Maybe never versions of Rebus makes it possible to share a single transaction.

HeyJoel commented 3 years ago

you cannot easily rollback a change to an in-memory cache

Actually, this feature is more to do with co-ordinating multiple commands. If I have two commands that I run in a parent transaction e.g. "Delete X", "Update Y" and X succeeds but Y fails, without this feature the message handler for X would have been triggered already before rolling back the data, but also the message handler for X may depend on Y being completed. So here we are ensuring all message handlers are called after the transaction has been completed.

Your next question is to naturally ask why don't we just run those message handlers in a transaction? As you have pointed out those handlers may contain methods that can't be rolled back - clearing a cache is one, but updating a lucene index or calling out to an HTTP endpoint is another. So if we run multiple commands in the same transaction, we could otherwise get into a situation where a transaction rolls back, but code that could not partake in the transaction is still executed.

Also, as mentioned in issue 419 transactions require thought and should not be kept open longer than necessary. By including those message handlers in the transaction we make it easier for developers to unknowingly create hard to debug problems like deadlocks.

A way to opt-in to the transaction for message handlers could be a way to go, but it could make the API quite confusing, I'd have to give that a bit of thought. My impression is that most other CMS's that work with events or "hooks" won't consider a transaction at all, and obviously those that only work on web hooks (e.g. headless) cannot. However we are at an advantage here in that we can use transactions, and so perhaps it should be considered.

An asynchronous queue or bus is a good idea, but often I would want handlers to complete synchronously - cache clearing is a good example where I need that to occur synchronously to avoid the cache being stale if the user immediately re-queries the system. So an asynchronous is useful, but not the whole solution.

There's nothing stopping you from using the existing message handler system to publish a message to an asynchronous queuing solution (e.g. Rebus), there's plenty of good solutions out there but I imagine you'd prefer something out of the box and integrated.

I would like build in asynchronous command execution into the Cofoundry CQS system, and in some of my client projects I have already built plugins for this, but it needs a bit more thought before including it, particularly with regards to scale-out and integration into 3rd party messaging systems. An integrated SQL-only solution will only scale so far.

JornWildt commented 3 years ago

If I have two commands that I run in a parent transaction e.g. "Delete X", "Update Y" and X succeeds but Y fails, without this feature the message handler for X would have been triggered already before rolling back the data, but also the message handler for X may depend on Y being completed.

Yes, but if both execute in the same transaction - not after "Delete X" having been committed for real into the database - then the failing "Update Y" would rollback and throw an exception which would eventually hit the outmost transaction scope, which would then rollback both "Delete X" and "Update Y" to a point where neither of the changes would ever have been visible.

Maybe this have something to do with the preferred way of handling exceptions? If failing "Update Y" does not result in an exception being thrown, then, yes, certainly, "Delete X" would be committed without the related "Update Y" ... which is why, in my world, failing something should always result in an exception, which ensures a database rollback.

By including those message handlers in the transaction we make it easier for developers to unknowingly create hard to debug problems like deadlocks.

Yes, but by leaving them out of the transaction, you make it impossible emit durable messages that rollback together with the main operation, should the queing fail - which means you get the main database update, but loose the signal to the remaining loosely coupled parts of the application.

An asynchronous queue or bus is a good idea, but often I would want handlers to complete synchronously - cache clearing is a good example where I need that to occur synchronously to avoid the cache being stale if the user immediately re-queries the system.

Well, that kind of depent on you business case :-) But that is a looooong discussion to enter.

There's nothing stopping you from using the existing message handler system to publish a message to an asynchronous queuing solution (e.g. Rebus),

Well, strictly speaking, there is something stopping me, which is why I'm bugging this. Because the synchronous messages system is triggered after committing the transaction, I cannot force the main operation to rollback if the message sending fails (it could be that excact instant the network failed).

But, honestly, I'm not going to need it. There is very little chance of me building something that needs it. I was just pointing out a potential issue here.

I would like build in asynchronous command execution into the Cofoundry CQS system, and in some of my client projects I have already built plugins for this, but it needs a bit more thought before including it, particularly with regards to scale-out and integration into 3rd party messaging systems.

You may have heard of Udi Dahan, but if not, then try to look up what he is saying. He runs a company that produces NServiceBus - which Rebus is quite inspired from.

JornWildt commented 3 years ago

But, speaking of transactions ... maybe I am wrong anyway. What would happen in this scenario?

// ApiController

void DoStuf()
{
  using (var scope = _transactionScopeManager.Create(_myDbContext))
  {
    var cmd = new MyCommand();
    await ApiResponseHelper.RunCommandAsync(cmd);
    await scope.CompleteAsync();
    return;
  }
}

// Command handler
public async Task ExecuteAsync(DeletePageCommand command, IExecutionContext executionContext)
{
  using (var scope = _transactionScopeFactory.Create(_dbContext))
  {
    scope.QueueCompletionTask(() => OnTransactionComplete(command));
    await scope.CompleteAsync();
  }
}

private Task OnTransactionComplete(DeletePageCommand command)
{
  .. EMIT MESSAGE ...
}

As you see, I have two transaction scopes - one in the controller and one in the command handler. The OnTransactionComplete handler is executed outside the second transaction, but due to the ambient transaction from the first scope, it should actually be executed in the same database transaction. Is that correct? If so, it means I can control the transaction handling in a way that solves my issues.

JornWildt commented 3 years ago

And now that I look at this piece:

  using (var scope = _transactionScopeManager.Create(_myDbContext))
  {
    var cmd = new MyCommand();
    await ApiResponseHelper.RunCommandAsync(cmd);
    await scope.CompleteAsync();
    return;
  }

I notice again different philosophies at play - in my world, I would rather have the scope auto-completing as long as no exception is thrown. Which in turns mean the code could be more concise:

  // Assume transactions are always committed, unless something inside the transaction scope throws an exception
  _transactionScopeManager.ExecuteWithTransaction(_myDbContext, () =>
  {
    var cmd = new MyCommand();
    await ApiResponseHelper.RunCommandAsync(cmd);    
  });

But I am rambling - feel free to ignore me if you think I'm going to far.

HeyJoel commented 3 years ago

Lots to reply to here...

I would rather have the scope auto-completing as long as no exception is thrown

Our behavior mimics the behavior of System.Transactions.TransactionScope to be consistent and the behavior should be familiar for those already used to System.Transactions, except with more sensible defaults. System.Transactions.TransactionScope does not auto-complete, so we don't either.

I looked at your code here and I should point out that with EF DbContext.SaveChangesAsync() will automatically wrap any code that updates the database in a transaction, so you don't need to use ITransactionScopeManager here. The same is true with any Cofoundry commands, you don't need to wrap individual command execution in a transaction, the ITransactionScopeManager is only required when coordinating multiple commands.

In your example above` there's not much benefit to using transactions, because there's only one command, but I think what you're trying to do here is wrap the event messages in an outer transaction? That won't have an effect because the ambient transaction queues up the completion tasks and will execute them after the transaction completes.

It's worth clarifying at this point that at the SQLServer level there's isn't such a thing as "nested transactions", as you can only have one transaction per connection. Nested transactions is really just an application-level concept where the ambient transaction is managed by ITransactionScopeManager, in the same way System.Transactions.TransactionScope does.

Anyway, the issue that remains here is that you want completion tasks to be executed in a transaction, and we don't do that for legitimate reasons, even though in your case they may not apply. To resolve the issue we'd need to provide the option to change the behavior of ITransactionScopeManager to run completion tasks inside the transaction.

Alternatively we could provide a second way of handling messages that does run inside the transaction, but I'm not sure on he design of that, I'd need to think about it.

Some workarounds:

JornWildt commented 3 years ago

Our behavior mimics the behavior of System.Transactions.TransactionScope to be consistent and the behavior should be familiar for those already used to System.Transactions, except with more sensible defaults. System.Transactions.TransactionScope does not auto-complete, so we don't either.

Sure. I probably shouldn't have stated it as "I would rather ...". Thanks for clarifying.

but I think what you're trying to do here is wrap the event messages in an outer transaction?

Yes.

That won't have an effect because the ambient transaction queues up the completion tasks and will execute them after the transaction completes.

Ah, okay, didn't think of that.

Override ITransactionScopeManager with your own implemention that changes the behavior. Default implementation is here.

Good point. I could even go a bit further and just add a new interface like ITransactionScopeManager_v2 which itself implements ITransactionScopeManager and adds a new QueueCompletionTask_InTransaction method - while still inherit or build on the original transaction scope manager.

But as stated before, I'm most likely not going to need it - I was just trying to figure out what I was missing here and what behavior I need to rethink. So thanks for taking your time to reply.