Open jdfrozen opened 1 year ago
So when this Key_Shared subscription has a lot of consumers, and some consumers are slow consumers, and some consumers start messaging and find out that stickyKeyHash is for slow consumers, Then these messages will add MessagetoReplay, and a large backlog will cause this problem
Add parameters to the use of boot "-XX:+HeapDumpBeforeFullGC"
The KEY_SHARE
mode is a somewhat strict type. That is very sensitive to the consumption(acknowledgement) rate since it should ensure the message order. when adding some consumers to the subscription, the key hash should be recalculated, and some new messages index should keep in the broker memory to avoid breaking delivery order.(one key deliver to one consumer at the moment)
Therefore, It's expected behaviour. You can check why some of your consumers can't catch up or consider If you can try to use another subscription mode like SHARED
.
But anyway. You are right. We should have a limit on this container's memory usage to avoid one topic affecting the whole broker. :)
I verified and tested the set-max-unacked-messages-per-subscription as small as 1000 to avoid fullgc. When I verify, I use the namespace policy "pulsar-admin namespaces get-max-unacked messages-per-subscription" We want to set the topics level policy. We are using version 2.7.4. Is the topics level policy stable enough?
Hi, @jdfrozen
2.7.x is a kinda old version, I am unsure if it can work properly. But you can give it a try. :)
The issue had no activity for 30 days, mark with Stale label.
One of the root causes behind this issue is described in #23200 . It's addressed by #23231 and #23226. I believe that the OOM issue got mitigated already by #17804.
Search before asking
Version
2.7.x
Minimal reproduce step
1、A large backlog of Key_Shared subscription messages 2、The subscription has multiple consumers
What did you expect to see?
broker functioning
What did you see instead?
1、broker frequent gc 2、broker fullgc 3、broker OOM
This is broker gc monitoring
Add parameters to the use of boot “-XX:+HeapDumpOnOutOfMemoryError”, When fullgc is sent, the analysis is done through mat
Anything else?
Root cause: redeliveryMessages contains a large number of messages
PersistentStickyKeyDispatcherMultipleConsumers.java
Are you willing to submit a PR?