apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.24k stars 11.69k forks source link

After the cluster node is offline, the message is found to be re-consumed #2181

Closed MrYueQ closed 2 years ago

MrYueQ commented 4 years ago

BUG REPORT

There are 4 instances in the cluster. Because a instance is offline, the message data is found to be re-consumed from the monitoring data.

system info

> uname -rsm
Linux 3.10.0-862.14.4.el7.x86_64 x86_64

rocketmq info

# cluster modle
2m-2s-sync
# version
Installed Packages
Name        : rocketmq
Arch        : x86_64
Version     : 4.4.0
Release     : 1.el7
Size        : 13 M
Repo        : installed
From repo   : conf
Summary     : Apache RocketMQ
License     : GPL

pull msg info

2020-07-19 10:45:46 INFO PullMessageThread_11 - the request offset: 30266 over flow badly, broker max offset: 30265, consumer: /consumer01
2020-07-19 10:45:46 INFO PullMessageThread_12 - the request offset: 30268 over flow badly, broker max offset: 30267, consumer: /consumer02
2020-07-19 10:45:46 INFO PullMessageThread_16 - the request offset: 30292 over flow badly, broker max offset: 30291, consumer: /consumer03
2020-07-19 10:45:46 WARN PullMessageThread_11 - PULL_OFFSET_MOVED:correction offset. topic=example_topic, groupId=example_group_info, requestOffset=30266, newOffset=0, suggestBrokerId=0
2020-07-19 10:45:46 WARN PullMessageThread_16 - PULL_OFFSET_MOVED:correction offset. topic=example_topic, groupId=example_group_info, requestOffset=30292, newOffset=0, suggestBrokerId=0
2020-07-19 10:45:46 WARN PullMessageThread_12 - PULL_OFFSET_MOVED:correction offset. topic=example_topic, groupId=example_group_info, requestOffset=30268, newOffset=0, suggestBrokerId=0

consumer msg info

2020-07-19 10:45:52 WARN ConsumerManageThread_11 - [NOTIFYME]update consumer offset less than store. clientHost=client_addr, key=key_consume, queueId=5, requestOffset=0, storeOffset=30272
2020-07-19 10:45:52 WARN ConsumerManageThread_14 - [NOTIFYME]update consumer offset less than store. clientHost=client_addr, key=key_consume, queueId=15, requestOffset=0, storeOffset=30268
2020-07-19 10:45:52 WARN ConsumerManageThread_14 - [NOTIFYME]update consumer offset less than store. clientHost=client_addr, key=key_consume, queueId=12, requestOffset=0, storeOffset=30292
2020-07-19 10:45:52 WARN ConsumerManageThread_19 - [NOTIFYME]update consumer offset less than store. clientHost=client_addr, key=key_consume, queueId=14, requestOffset=0, storeOffset=30266

mon metric image

francisoliverlee commented 4 years ago

@MrYueQ rmq dont ensure you NO-REPEAT-MESSAGES. you can do idempotent(幂等)when consuming

MrYueQ commented 4 years ago

您好。两个问题

  1. 当节点掉线重新加入集群,Pull 模式下,offset 被至0. 而不是重置到 broker offset.
  2. 这类问题发生时,是否有允许手动指定 某topic 下分区的 offset. 因为尽管下游做了幂等,会因消息重复数据量过大,下游消费不及时而引发客诉。
francisoliverlee commented 4 years ago

@MrYueQ Pull-and-Connsme, it means you must commit the offset, and when you restart your pull-consumer, firstly you should get the lastest offset from broker, pull from the offset to consume

MrYueQ commented 4 years ago

嗯,认同您所描述的。但日志提示以下信息。

2020-07-19 10:45:46 INFO PullMessageThread_16 - the request offset: 30292 over flow badly, broker max offset: 30291, consumer: /consumer03
2020-07-19 10:45:46 WARN PullMessageThread_11 - PULL_OFFSET_MOVED:correction offset. topic=example_topic, groupId=example_group_info, requestOffset=30266, newOffset=0, suggestBrokerId=0

是因为consumer 没有设置 类似 auto_offset_reset:(earliest| latest| none | )

francisoliverlee commented 4 years ago

if "you want to customize your logic" || streaming use Pull Consumer if "lazy to do mq things" try DefaultPushConsumer

ps: how to handle logic like "PULL_OFFSET_MOVED:correction", check push consumer's codes.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 3 days since being marked as stale.