apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.17k stars 11.67k forks source link

a large number of FlowMonitor thread in rocketmq #6226

Closed iamssx closed 1 year ago

iamssx commented 1 year ago

use the default config to start up a rocketmq cluster,when it run for a month or so, there will be a jvm crash because of oom. we try to modify the linux memory arguments , and that is not worked. and we get the jvm dump and thread stack. now we found that a lot of FlowMonitor thread wating, and it keep increase.we suspect the amount of flowmonitor thread is the reason of jvm crash.

please read the thread analysis and jvm crash, and tell us the true reason why it crash, and the way to fix it, thanks!

environment: rocketmq 5.0.0 linux redhat7.9, 48Core/755GB MEM

/usr/local/jdk/bin/java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -XX:SoftRefLRUPolicyMSPerMB=0 -verbose:gc -Xloggc:/dev/shm/rmq_srvgc%p_%t.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=30m -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -XX:MaxDirectMemorySize=15g -XX:-UseLargePages -XX:-UseBiasedLocking -Drocketmq.client.logUseSlf4j=true -cp .:/data/rocketmq-all-5.0.0-bin-release//bin/../conf:/data/rocketmq-all-5.0.0-bin-release//bin/../lib/*: -Djdk.tls.rejectClientInitiatedRenegotiation=true org.apacherocketmq.broker.BrokerStartup -c /data/rocketmq/conf/broker-c.properties

the jvm crash info: image

thread dump: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjMvMDMvMi90aHJlYWQudHh0LS03LTE0LTU2&

iamssx commented 1 year ago

there is enougth free memory image

RongtongJin commented 1 year ago

Hi @iamssx, I wonder know whether the oom is master node or slave node? Is it controller mode?

guyinyou commented 1 year ago

It is caused by no shutdown when connection closed.

RongtongJin commented 1 year ago

Hi @iamssx I have reproduce the issue and will fix it ASAP.

iamssx commented 1 year ago

Hi @iamssx I have reproduce the issue and will fix it ASAP.

thanks for your help.