apache / rocketmq-client-go

Apache RocketMQ go client
https://rocketmq.apache.org/
Apache License 2.0
1.3k stars 416 forks source link

the offset in the processQueue were not removed correctly #927

Closed 0daypwn closed 1 year ago

0daypwn commented 1 year ago

The issue tracker is ONLY used for the go client (feature request of RocketMQ need to follow RIP process). Keep in mind, please check whether there is an existing same report before your raise a new one.

Alternately (especially if your communication is not a bug report), you can send mail to our mailing lists. We welcome any friendly suggestions, bug fixes, collaboration, and other improvements.

Please ensure that your bug report is clear and that it is complete. Otherwise, we may be unable to understand it or to reproduce it, either of which would prevent us from fixing the bug. We strongly recommend the report(bug report or feature request) could include some hints as to the following:

BUG REPORT

  1. Please describe the issue you observed:

    • What did you do (The steps to reproduce)? producer send message very fast. consumer consume message very fast.

    • What did you expect to see?

    • What did you see instead? some process queue's cache offset may not remove correctly. then the consumer offset can't update to broker. When this happens many times, it may block queue consume.

      e32a9613-7552-4bb4-9f6d-99131828e6e8
  2. Please tell us about your environment:

    • What is your OS?

    • What is your client version? v2.1.1

    • What is your RocketMQ version?

  3. Other information (e.g. detailed explanation, logs, related issues, suggestions on how to fix, etc):

processQueue put message order:

  1. (pq.putMessage)put messages to channel pq.msgCh
  2. (pq.putMessage)lock pq.mutex
  3. (pq.putMessage)put messages to map pq.msgCache
  4. (pq.putMessage)unlock pq.mutex
3273d497-84b7-41fd-a0fa-58ba20965071

conumse message order:

  1. (pq.getMessages)get messages from channel pq.msgCh
  2. consumerInner
  3. if consume success, do remove message
  4. (pq.removeMessage)lock pq.mutex
  5. (pq.removeMessage)remove messages from map pq.msgCache
  6. (pq.removeMessage)unlock pq.mutex 58b7595e-ff78-4a6e-abda-569ad6033738

In high concurrency scenarios, the order may be out of order.

  1. (pq.putMessage)put messages to channel pq.msgCh
  2. (pq.getMessages)get messages from channel pq.msgCh
  3. consumerInner
  4. if consume success, do remove message
  5. (removeMessage)lock pq.mutex
  6. (removeMessage)remove messages from map pq.msgCache. At this time, the offset is not in the map.
  7. (removeMessage)unlock pq.mutex
  8. (pq.putMessage)lock pq.mutex
  9. (pq.putMessage)put messages to map pq.msgCache No one will delete it again.
  10. (pq.putMessage)unlock pq.mutex