abpframework / abp

Open-source web application framework for ASP.NET Core! Offers an opinionated architecture to build enterprise software solutions with best practices on top of the .NET. Provides the fundamental infrastructure, cross-cutting-concern implementations, startup templates, application modules, UI themes, tooling and documentation.
https://abp.io
GNU Lesser General Public License v3.0
12.77k stars 3.41k forks source link

Publishing Distributed Events as Transactional #6126

Closed gentledepp closed 2 years ago

gentledepp commented 3 years ago

Issue: abp.io seems to create a two-phase commit distributed transaction when using distributed events

Abp.io latest source code sends messages (domainevents which should be called integrationevents in DDD I guess) when committing a transaction.

Thus, it can happen, that

  1. the EF transaction succeeds
  2. but the message bus is not reachable (for a longer time so that even using Polly does not help)

In that case, the events will never be sent!.

If I understand your code base correctly, you do send the distributed events within the EF Core transaction: https://github.com/abpframework/abp/blob/32402a99b07ea994efe1880e0655eff3c73e2166/framework/src/Volo.Abp.EntityFrameworkCore/Volo/Abp/EntityFrameworkCore/AbpDbContext.cs#L165-L167

At least, you send it to the message bus and if that fails, the transaction is also not committed. However, isn't that exactly what causes a two phase commit? https://github.com/abpframework/abp/blob/6136fab6acc81ea1b3b65d34f3978451b6f64a4b/framework/src/Volo.Abp.Ddd.Domain/Volo/Abp/Domain/Entities/Events/EntityChangeEventHelper.cs#L48-L55

Shouldn't this be avoided?

Solution: The outbox pattern

The solution is using the outbox pattern: Instead of directly sending the events to the event bus (e.g. RabbitMQ), they are

Note: There is someone who wrote a 3 series blog article about this very problem and in part 3 he even developed a sample solution which is available on github: https://github.com/mizrael/DDD-School/

Please have a look!

(Here are part 1 and part 2 for reference)

gentledepp commented 3 years ago

well, to be honest. I did go through the documentation and did not find any hint as to why abp implements this "everything within one transaction" approach.

If at least any one of the core team could comment on that? Maybe we do overlook something here and it totally makes sense... kind of... 🤷‍♂️

After all: If this is not well thought through, you will end up with an inconsistent state across microservices so, in my view, this is a critical design flaw

@hikalkan could you please get us someone to comment on this? (One line should suffice - thanks!)

hikalkan commented 3 years ago

Hi all,

The solution is using the outbox pattern: Instead of directly sending the events to the event bus (e.g. RabbitMQ), they are serialized to the database (-> within the same transaction) later consumed by a polling consumer (polling consumer pattern)

This is what we were planning. But currently, it is not implemented.

RabbitMqDistributedEventBus implements the related work. If you want to do it now, you can create your own IDistributedEventBus implementation that adds events to a database table, then a background worker that polls from the table and publishes to rabbitmq. All the other ABP systems will work as expected.

gentledepp commented 3 years ago

well, integration events are somewhat the backbone of micro services. Without this implemented, abp.io can only be used for building a modular monolith for now. But thanks for clarifying this!

geffzhang commented 3 years ago

https://github.com/EasyAbp/Abp.EventBus.CAP integrated the CAP with the ABP framework,

CAP is a library based on .Net standard, which is a solution to deal with distributed transactions, has the function of EventBus, it is lightweight, easy to use, and efficient.

In the process of building an SOA or MicroService system, we usually need to use the event to integrate each service. In the process, simple use of message queue does not guarantee reliability. CAP adopts local message table program integrated with the current database to solve exceptions that may occur in the process of the distributed system calling each other. It can ensure that the event messages are not lost in any case.

You can also use CAP as an EventBus. CAP provides a simpler way to implement event publishing and subscriptions. You do not need to inherit or implement any interface during subscription and sending process.

leonkosak commented 3 years ago

Well, I have situation that RabbitMQ-based distributed event bus publishes every second message only. I tried with UoW with requiresnew and transactional on true but nothing helps. Is this somehow related problem?

leonkosak commented 3 years ago

Anyone? Can someone confirm that this issue will solve the problem that I described with distributed event bus? Thank you.

mrs01dev commented 3 years ago

Please consider using https://github.com/MassTransit/MassTransit for communication with Service Bus ( RabbitMQ, Azure Service Bus, ActiveMQ, and Amazon SQS/SNS )

hikalkan commented 3 years ago

I was re-checking the CAP library. However, it doesn't support multi-tenancy with separate databases (https://github.com/dotnetcore/CAP/issues/699) and a single modular application with multiple databases (https://github.com/dotnetcore/CAP/issues/783). It also requires to use its custom database transaction approach.

Out problem is much harder, because we have multi-tenancy (with separate db support), multiple db can be used by different modules in the same process. We also want to make it seamlessly works with the current IDistributedEventBus.

hikalkan commented 3 years ago

Implementing this feature with supporting the following scenarios seems complicated:

We should support all these scenarios. So, I postpone this to the next major version, 5.0, to work on it.

leonkosak commented 3 years ago

We have slightly specific requirements for Pub/Sub mechanism and I have so questions if this is possible achieve.

Is somehow possible to use Pub/Sub mechanism in our scenarios, because currently we use HttpClient calls to "simulate" Pub/Sub mechanism for distributed event bus. The main problem here is that HttpClient should wait "subscribed" applications to finish its own work and therefore initial request from Internet gateway (BFF) can last 10sec+.

Using Background workers is not solution, because there are multiple chained-calls inside microservice architecture and if each part have 1 second to start processing, this takes a lot of time. Whole "Pub/Sub chained-calls" are time critical and therefore waiting background worker in each standalone API application to start process newly accepted Eto is not solution for us.

Does anyone have suggestions for us how to solve Pub/Sub on our cases? @hikalkan, @maliming, @geffzhang? Thank you.