apache / eventmesh

EventMesh is a new generation serverless event middleware for building distributed event-driven applications.
https://eventmesh.apache.org/
Apache License 2.0
1.6k stars 638 forks source link

[Enhancement] Implement retry strategy based on more message middleware #4556

Closed pandaapo closed 4 months ago

pandaapo commented 10 months ago

Search before asking

Enhancement Request

Currently, EventMesh supports two retry strategies for the case of message consumption failure: a retry strategy using HashedWheelTimer in memory (default) and a retry strategy based on external storage RocketMQ. There are still retry strategies based on other message brokers waiting to be implemented.

对于消息消费失败的情况,目前EventMesh支持在内存中使用HashedWheelTimer的重试策略(默认)和基于外部存储RocketMQ的重试策略。还有基于其他消息中间件的重试策略等待实现。

Describe the solution you'd like

task list

Welcome to claim the task you are interested in and create corresponding sub issue.

Are you willing to submit PR?

Code of Conduct

VishalMCF commented 10 months ago

@pandaapo @yanrongzhen @mxsm For the task above I want to ask the approach. Will only creating a new module (and implementing the RetryStrategy class) under the events-retry module be sufficient for all use cases (Kafka, pulsar, redis, rabbitMQ)? I am asking this because I saw the code of the rocketMQRetryStrategyImpl and it does not contain any logic specifically related to RocketMQ.

pandaapo commented 10 months ago

@VishalMCF RocketMQRetryStrategyImpl utilizes RocketMQ's retry queue feature. It sends messages to a queue with the prefix "% RETRY%", and RocketMQ will automatically retry. So it's not enough to create a new module under the eventmesh-retry module. You need to use the corresponding message middleware to implement the retry logic in the new module.

RocketMQRetryStrategyImpl 利用了RocketMQ的重试队列特性,将消息发送到前缀为“%RETRY%”的队列,RocketMQ会自动进行重试。所以并不是在eventmesh-retry模块下创建一个新模块就够了,您需要在新模块中利用相应的消息中间件实现重试逻辑。

HarshSawarkar commented 8 months ago

Hey @pandaapo , I would like to work on this issue if no one is currently working on it.

@VishalMCF RocketMQRetryStrategyImpl utilizes RocketMQ's retry queue feature. It sends messages to a queue with the prefix "% RETRY%", and RocketMQ will automatically retry. So it's not enough to create a new module under the eventmesh-retry module. You need to use the corresponding message middleware to implement the retry logic in the new module.

RocketMQRetryStrategyImpl 利用了RocketMQ的重试队列特性,将消息发送到前缀为“%RETRY%”的队列,RocketMQ会自动进行重试。所以并不是在eventmesh-retry模块下创建一个新模块就够了,您需要在新模块中利用相应的消息中间件实现重试逻辑。

In this context, does the corresponding message middleware refer to CloudEvent?

pandaapo commented 8 months ago

In this context, does the corresponding message middleware refer to CloudEvent?

@HarshSawarkar No, "message middleware" refers to messaging service, like Kafka, RocketMQ (implemented), RabbitMQ, Pulsar or Redis.

HarshSawarkar commented 7 months ago

Hi @pandaapo for the pulsar message middleware there is no pluging for logs. I tried to add SLF4j logger dependencies in the build.gradle but it doesn't seem to resolve the dependency of log for eventmesh-retry-pulsar module. Ho wcan thie be achieved? I am attaching ss for your reference
image

Pil0tXia commented 7 months ago

@HarshSawarkar

The Slf4j artifact is introduced in eventmesh-common/build.gradle. You may need to depend eventmesh-common module and inject logger with @Slf4j annotation.

HarshSawarkar commented 7 months ago

@HarshSawarkar

The Slf4j artifact is introduced in eventmesh-common/build.gradle. You may need to depend eventmesh-common module and inject logger with @Slf4j annotation.

I have made the changes. Can you please review the PR? Here's the link to it https://github.com/apache/eventmesh/pull/4769

jevinjiang commented 5 months ago

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

pandaapo commented 5 months ago

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

@jevinjiang Firstly, let me explain the purpose of this issue. It is to enable handing over the retry of messages to external messaging service, both to reduce the pressure on the EventMesh runtime service and to reuse the mature retry ability or potential ability of these services. So the scenario you propose is not in line with this issue.

Coming back to the discussion of your solution itself, it amounts to referencing RocketMQ's retry idea to implement similar functionality in EventMesh. Regardless of whether it is completely memory-based or based on external storage, there must be some complexity, such as if the retry still fails, is it necessary to add another dead letter topic? If it is based on memory, it is still adding EventMesh runtime service running pressure. If it is based on external storage, it is equivalent to wasting the ability or potential ability that external storage can provide.

首先我说明一下这个issue的目的,是要实现将对消息的重试工作交给外部消息中间件,一是可以减轻EventMesh runtime服务的运行压力,二是重用这些中间件成熟的重试能力或潜在能力。所以您提的方案是不符合该issue的。

再来讨论下您的方案本身,相当于参考RocketMQ的重试思路,在EventMesh中实现类似的功能。不管是完全基于内存的,还是基于外部Storage,肯定有一定的复杂性,比如如果重试还是失败,是不是还要再加个死信topic?如果是基于内存,相当于还是增加了EventMesh runtime服务的运行压力;如果是基于外部Storage,相当于浪费了外部Storage本身能提供的能力或潜在能力。

jevinjiang commented 5 months ago

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

jevinjiang commented 5 months ago

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

Of course, I am just providing a general idea. If the idea is approved, I will add some details (such as the maximum number of retries, callback after failure)

jevinjiang commented 5 months ago

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

In fact, my original intention was for a message queue without a retry mechanism.

pandaapo commented 5 months ago

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

What do you mean by "message queues with or without retry strategies"? Does it mean that some external message middleware does not have ready-made retry features, or is it something else?

您说的“有或没有提供重试策略的消息队列”是指什么?是指某种外部消息中间件有没有现成的重试特性,还是其他什么意思?

jevinjiang commented 5 months ago

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

What do you mean by "message queues with or without retry strategies"? Does it mean that some external message middleware does not have ready-made retry features, or is it something else?

您说的“有或没有提供重试策略的消息队列”是指什么?是指某种外部消息中间件有没有现成的重试特性,还是其他什么意思?

yeah, such as kafka、redis( redisson topic), no middleware provides consumption retries.

pandaapo commented 5 months ago

Let's clarify some things: the retries that this issue talks about are internal to EventMesh, and are not a retry feature available to the user; EventMesh automatically retries messages that fail internally. Users can only set it to retry in memory (default, implemented, increase the pressure on EventMesh), retry in RocketMQ (implemented), and retry in other external messaging middleware (the purpose of this issue).

Do you have any misunderstandings about these? If there is no misunderstanding, in the case of "some external messaging middleware doesn't have an existing retry feature", that's what the issue wants developers to do, to use this external messaging middleware, and to develop a feature for retrying in this middleware (reduce the burden on EventMesh).

我们先明确一些事情:该issue谈论的重试是EventMesh内部的机制,不是向用户提供的重试特性。EventMesh对于失败的消息,内部会自动进行的重试。用户只能设置:在内存中重试(默认的、已实现、会增加EventMesh运行压力)、在RocketMQ中重试(已实现)、在其他外部消息中间件中重试(该issue的目的)。 请问对这些您有没有误解?如果没有误解,对于“某种外部消息中间件没有现成的重试特性”的情况,也是该issue希望开发者能完成的事啊,利用这种外部消息中间件,开发在该中间件中进行重试的功能(减轻EventMesh负担)。

jevinjiang commented 5 months ago

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

emmmm.... This idea of ​​mine is to solve this problem. ? ? ?

jevinjiang commented 5 months ago

Sorry, I don't understand. I've been telling you my ideas for solving the problem.

pandaapo commented 5 months ago

You don't need to say sorry, Perhaps I didn't understand your meaning. In my opinion, your plan still requires the heavy participation of EventMesh runtime to support the retry process.

您没有必要道歉,也许是我没有理解您的意思。在我看来,您的方案依然需要EventMesh runtime重度参与和支撑重试过程。

jevinjiang commented 5 months ago

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, The failed event will re-call producer.publish again and put it into the original topic.

pandaapo commented 5 months ago

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, the failed event is directly added to the end of the original topic.

The fact is that I didn't understand your meaning before. I think the first plan is more feasible, while the second plan, in my opinion, will cause message confusion in the message queue and duplicate consumption. For "create a retry topic when creating a topic", it is better to delay the creation until the retry is needed.

jevinjiang commented 5 months ago

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, the failed event is directly added to the end of the original topic.

The fact is that I didn't understand your meaning before. I think the first plan is more feasible, while the second plan, in my opinion, will cause message confusion in the message queue and duplicate consumption.

For "create a retry topic when creating a topic", it is better to delay the creation until the retry is needed.

Thanks for your suggestion, I will implement it soon.

pandaapo commented 5 months ago

Thanks for your suggestion, I will implement it soon.

Thanks for your future contribution. Please create corresponding new sub issue(s) then.

pandaapo commented 4 months ago

I found that the original design of the retry module seems to have some issues, but I'm not sure yet. I need to close this issue first.