apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.09k stars 11.63k forks source link

[Enhancement] Add message gray strategy solution, compatible with RocketMQ 4.x and RocketMQ 5.x #8468

Open syhleo opened 1 month ago

syhleo commented 1 month ago

Before Creating the Enhancement Request

Refer RIP-72 detail link: 语雀

Summary

The solution provides an extensible message grayscale solution for implementing message grayscale publishing that supports RocketMQ 4.x and RocketMQ 5.x . It is compatible with POP consumption mode and Push consumption mode. No matter the client-rebalance or server-rebalance, the solution can realize the message gray level in a lightweight way. The solution has been widely applied and verified in our projects, which confirms its reliability, security, and stability.

Motivation

At present, many business systems control the impact of version release through the gray scale release process, and almost all service releases will strictly go gray scale release first, and then go online after verification. gRPC and HTTP calls during GRay-scale publishing can be precisely controlled through microservice governance and gateway request headers. 
However, if the message gray scale scheme is not implemented, even if the current link is a gray scale link, the message transmission is still uncontrollable, may go to the gray cluster monitoring partition, may also enter the online cluster monitoring partition. Therefore, when it comes to consuming logic changes, developers need to incorporate a lot of compatibility logic into their code. Even so, these logics only ensure that new traffic does not affect the line, but they do not guarantee that gray-scale traffic precisely enters the gray-scale consumer client, so that strict gray-scale verification cannot be performed. 
The industry lacks perfect message gray scale solution, the existing MQ gray scale solution can not completely solve the problem of message isolation and switching interface. Therefore, it is particularly important to design a lightweight message gray scale scheme that is easy to access and controllable to upgrade.

Describe the Solution You'd Like

  1. Add client configuration: Add enableGraySwitch and grayTag. enableGraySwitch Indicates whether the grayscale message is enabled. grayTag indicates whether the client is a grayscale client. grayTag takes effect only when enableGraySwitch is enabled. The solution adds two configuration items that make it easy for the RocketMQ consumer to determine whether a client is a grayscale client.
  2. Use the clientId generation mechanism: If the client is a grayscale client (enableGraySwitch and grayTag are both true), its clientId will contain the @gray identifier.
  3. Producer message sending strategy: The producer client decides to send the message to grayscale partition or non-grayscale partition according to whether there is a grayscale identifier. For this purpose, the SelectMessageQueueByGray policy is implemented specifically to ensure that producers send messages accurately to the specified MessageQueue when they are sent.
  4. The new weight balance strategy: to design a new weight balance strategy, expand the AllocateMessageQueueAveragely, compatible RocketMQ 4 x and 5. X version different versions, and different consumption patterns (such as POP/Push). The core idea is that during rebalancing, grayscale clients are distinguished from non-grayscale clients by whether or not there is a @gray identifier in the clientId string, ensuring that grayscale clients consume from grayscale queues and non-grayscale clients consume from non-grayscale queues.
  5. Compatible with POP consumption mode: The program also takes into account POP consumption mode. However, the scheme will be divided into gray scale queue and non-gray scale queue according to gray scale identification. The feature of POP consumption mode is still retained, that is, multiple POP Normal consumers can consume the same Normal Queue. Similarly, multiple POP Gray consumers can consume the same Gray Queue. Ensure that every consumer client can consume messages.
  6. Switch connection processing: When all gray consumer clients do not exist (for example, the gray verification on the consumer side is published online, or the gray consumer side is abnormally offline), the message will be taken over by other normal consumer clients immediately to ensure that the message will not be lost.

未命名文件 (2)

Describe Alternatives You've Considered

/

Additional Context

Gray scale partition

This solution is lightweight, easy to access and upgrade controllable message grayscale scheme. Compatible with RocketMQ4.x and RocketMQ5.x versions.

Easy access. Business parties control only a few client configurations, such as enableGraySwitch and grayTag, to seamlessly access RocketMQ's gray-scale publishing capabilities, enabling full-link gray-scale publishing.Specific can refer to org.apache.rocketmq.example.gray

syhleo commented 1 month ago

the related documentation:语雀

Easy access. Business parties control only a few client configurations, such as enableGraySwitch and grayTag, to seamlessly access RocketMQ's gray-scale publishing capabilities, enabling full-link gray-scale publishing.Specific can refer to org.apache.rocketmq.example.gray

syhleo commented 1 month ago

We know that in practical applications, full-link gray scale publishing scenario is very common, and message queue as one of the links should also have this capability, this scheme makes RocketMQ4.x and RocketMQ5.x native support message gray scale publishing function. Compared with existing solutions in the industry, this solution does not require additional shadow Topic/Group. There is also no need to transform grayscale tag/UserProperty, etc. (ps: When the scale of the business is relatively large, the grayscale function is implemented by adding a topic or group. On the one hand, there is a critical problem, that is, when the grayscale verification is switched to prod, messages may be missed for consumption, which is unacceptable. On the other hand, there is the cost issue. Imagine the cost of doubling each topic and group, which cannot be ignored.) The solution uses gray partition to solve the problem of message isolation and switching connection at low cost, and has low intrusion to RocketMQ. Just add a few client configurations and access grayscale policies to allow businesses to seamlessly access RocketMQ grayscale publishing capabilities.

syhleo commented 4 weeks ago

The solution has been widely applied and verified in our projects, which confirms its reliability, security, and stability.

ZShUn commented 2 weeks ago

How to avoid non-grayscale consumer consumption when message retry occurs

syhleo commented 2 weeks ago

How to avoid non-grayscale consumer consumption when message retry occurs

If it is a retry topic, the equal-allocation policy is used (the retry topic goes to the internal callback broker, which queue is written to is random)

ZShUn commented 2 weeks ago

所以在这个场景下会有可能消息互串,比如prod实例消费0,1,2,3队列,gray实例消费4,5,6,7队列,如果gray实例消费触发消息重试,消息会默认回到0队列,这样会导致prod实例消费到灰度消息。